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PREFACE 



The CRAY Y-MP C90 functional description manual describes the basic 
functions, design, and architecture of the CRAY Y-MP C90 computer 
system and its associated peripheral devices. This manual is written 
primarily for Cray Research customers. 

This manual is divided into the following tabbed sections. 

Section 1, "System Overview," introduces and describes the 
CRAY Y-MP C90 system components and support equipment. 

Section 2, "Mainframe," describes the basic hardware architecture and 
CPU instructions of the CRAY Y-MP C90 mainframe. A specification 
sheet is included at the end of this section. 

Section 3, "I/O Subsystem," describes the basic architecture and 
functions of the input/output subsystem (IOS). A specification sheet is 
included at the end of this section. 

Section 4, "SSD Solid-state Storage Device," describes the basic 
architecture of the SSD solid-state storage device model E. A 
specification sheet is included at the end of this section. 

Section 5, "Peripheral Equipment," describes the function of the disk 
drives and network interface equipment used by the CRAY Y-MP C90 
computer system. Specification sheets are included at the end of this 
section. 

Section 6, "Software Overview," provides an overview of the software 
available for the CRAY Y-MP C90 computer system. 

For the readers' convenience, a glossary is included. It defines many of 
the commonly used abbreviations and terminology associated with the 
CRAY Y-MP C90 computer system. 
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The following conventions are used throughout this manual. 



Convention 



Description 



Lowercase italic Variable information. 

X or x or x An unused value. 

n A specified value. 

(value) The contents of the register or memory location 

designated by value. 

Register bit Register bits are numbered from right to left as 

designators powers of 2. Bit 2° corresponds to the least 

significant bit of the register. One exception is the 
vector mask register. The vector mask register bits 
correspond to a word element in a vector register; 
bit 2 63 corresponds to element and bit 2° 
corresponds to element 63. Another exception is 
when the contents of the 32 1-bit semaphore 
registers are loaded into an S register. SMO goes 
into S register bit position 2 63 , SMI goes into S 
register bit position 2 62 , and so on. 

Number base All numbers used in this manual are decimal, unless 

otherwise indicated. Octal numbers are indicated 
with an 8 subscript. Exceptions are register 
numbers, the instruction parcel in instruction 
buffers, and instruction forms, which are given in 
octal without the subscript. 



The following list provides examples of the preceding conventions. 



Example Description 

Transmit (AJc) to Transmit the contents of the A register specified by the 
Si k field to the S register specified by the i field. 

167ixk Machine instruction 167. The x indicates that the ;' 

field is not used. 

Read a specified number of words from memory. 



Read n words 
from memory 

Bit 2 63 
1000 8 



The value represents the most significant bit of an S 
register or element of a V register. 

The number base is octal. 



VI 
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1 SYSTEM OVERVIEW 



The CRAY Y-MP C90 computer system is a powerful, general-purpose 
supercomputer. The large memory, fast clock speed, and powerful 
input/output (I/O) capabilities enable fast throughput, resulting in 
efficient use of supercomputing power. The CRAY Y-MP C90 computer 
system achieves extremely high multiprocessing rates by efficiently 
using the scalar and vector processing capabilities of the multiple central 
processing units (CPUs), combined with the system's random-access 
memory (RAM) and shared registers. 

A standard CRAY Y-MP C90 computer system consists of the following 
components: 



A mainframe 

An input/output subsystem model E (IOS-E) 

An optional SSD solid-state storage device model E (SSD-E) 

Mass storage devices such as disk and tape drives 

A maintenance workstation model E (MWS-E) 

An operator workstation model E (OWS-E) 

Network interfaces 

Power and cooling support equipment 



The following subsections introduce the system components. Figure 1-1 
shows the minimum configuration for a CRAY Y-MP C90 computer 
system with two I/O clusters. Refer to Figure 1-1 when reading the 
following subsections. 

Subsequent tabbed sections provide more detailed information on the 
mainframe, IOS-E, SSD-E, peripheral devices, and system software. 



HR-04028 



1-1 



ro 



Maintenance Channel 



Maintenance Channel 



(To SSD-E) 



(Front-end Computer) 

\ NSC EN641 j (Local Area Network) 



y High-performance 
Disks 




Economical 
Disk Farm 



Magnetic Tapes 
and/or STK 4400 ACS 



High-performance Parallel 
Interface - 200 Mbytes/s 



DO 

6 
*». 
o 

00 



KEY 

LOSP - Low-speed channel (6 or 20 Mbytes/s) 
HISP - High-speed channel (200 Mbytes/s) 
VHISP - Very high-speed channel (1800 Mbytes/s) 



A-10393 



Figure 1-1. Minimum Configuration for CRAY Y-MP C90 Computer System with Two I/O Clusters 



CRAY Y-MP C90 Functional Description Manual System Overview 



Mainframe 



A CRAY Y-MP C90 mainframe contains an I/O section, an 
interprocessor communication section, central memory, and a variable 
number of CPUs. All CPUs in multiprocessor systems share the I/O 
section, interprocessor communication section, and central memory. 

The I/O section provides high-speed data transfers to and from the IOS-E 
and SSD-E. The I/O section contains three types of I/O channels: 

• 6- or 20-Mbyte/s LOSP channels 

• 200-Mbyte/s HISP channels 
1800-Mbyte/s VHISP channels 

The 6- or 20-Mbyte/s channels carry control information between the 
mainframe and IOS-E. The 200-Mbyte/s channels carry data between 
the mainframe and IOS-E. The 1800-Mbyte/s channels carry data 
between the mainframe and the SSD-E. The quantity of each channel 
type varies with different system configurations and depends on the 
quantity of CPUs and I/O clusters. Basically, each of the CPUs in the 
mainframe is configured with one LOSP channel and either two HISP 
channels or one VHISP channel. 

The interprocessor communication section enables each mainframe CPU 
to synchronize operation and pass data with other CPUs. Central 
memory holds program code and data. Central memory is available in 
different sizes and configurations. 

Each CPU has a control section and a computation section. The control 
section determines instruction issue and use of the computation section, 
central memory, and I/O resources. The computation section consists of 
operating registers and functional units. 

The mainframe is designed to deliver optimum overall performance. 
Separate registers and functional units support both integer and 
floating-point computations in both vector and scalar processing modes. 

Vector processing uses a single instruction to perform multiple 
operations on sets of ordered data. Scalar processing is a sequential 
operation where one instruction produces one result. When two or more 
vector operations are chained together, two or more operations execute 
simultaneously. Therefore, the computational rate for vector processing 
greatly exceeds that of conventional scalar processing. Scalar operations 
complement the vector capability by providing solutions to problems not 
readily adaptable to vector techniques. 
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The start-up time for vector operations is short enough that vector 
processing is more efficient than scalar processing for vectors containing 
as few as two elements. This feature allows fast vector processing to be 
balanced with high-speed scalar processing. 

Multiple-processor systems allow the use of multiprocessing or 
multitasking techniques. Multiprocessing allows several programs to be 
run concurrently on multiple CPUs of a single mainframe. Multitasking 
allows two or more parts of a program to run in parallel, sharing a 
common memory space. 

Refer to Section 2, "Mainframe," for more information on the internal 
operation of the mainframe. 



IOS-E/SSD-E 



The I/O subsystem model E (IOS-E) and the SSD solid-state storage 
device model E (SSD-E) are contained in the same cabinet. All 
CRAY Y-MP C90 computer systems include an IOS-E. The SSD-E, 
however, is optional and may not be included with your system. The 
functions of these two system components are explained in the following 
subsections. 



IOS-E 



The IOS-E performs the following functions: 

• It controls all data transfers between the mainframe or optional 
SSD-E and peripheral devices such as disk drives and front-end 
computers. 

• It buffers all data transfers between the mainframe or SSD-E and 
peripheral devices. 

• It converts data to and from the formats used by peripheral devices. 

• It detects and corrects certain types of data errors that occur during 
transfers. 

The transfer rates between the IOS-E and other equipment vary. Data 
transfers between the IOS-E and either the mainframe or the SSD-E use 
200-Mbyte/s channels. The mainframe and IOS-E transfer control 
information using the 6- or 20-Mbyte/s channels. The transfer rate 
between the IOS-E and peripheral devices depends on the peripheral 
device. 
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The IOS-E consists of a variable number of I/O clusters and two 
workstation interfaces (WINs). Each I/O cluster contains one I/O 
processor multiplexer (IOP MUX), four auxiliary I/O processors 
(EIOPs), and up to 16 channel adapters. The IOP MUX controls data 
transfers between the IOS-E and mainframe or SSD-E. The four EIOPs 
control data transfers between the IOS-E and peripheral devices through 
the channel adapters. Each EIOP can support a maximum of four 
channel adapters. The number of clusters and channel adapters varies, 
depending on the number and types of peripheral devices in the system. 

Workstation interfaces allow an OWS-E and MWS-E to control and 
monitor the operation of the I/O clusters. Refer to Section 3, "I/O 
Subsystem," for more information on the internal operation of the IOS-E. 



SSD-E 



The SSD-E is an optional high-performance device used for temporary 
data storage. It is housed in the same cabinet as the IOS-E. The SSD-E 
transfers data between the mainframe's central memory and the SSD-E 
through 1800-Mbyte/s channels. The 1800-Mbyte/s channel operates 
under mainframe program control. The SSD-E can also connect to the 
IOS-E by means of a 200-Mbyte/s channel pair. The 200-Mbyte/s 
channels operate under IOS-E program control. Refer to Section 4, 
"SSD Solid-state Storage Device," for more information on the internal 
operation of the SSD-E. 



Disk Storage Units 



The CRAY Y-MP C90 computer system uses Cray Research disk drives 
for mass data storage. A disk channel adapter (DCA-1 or DCA-2) 
interfaces the disk drives to an EIOP. The disk channel adapter can 
transfer data between the EIOP and multiple disk drives at full speed, 
even when all the drives are operating simultaneously. Refer to Section 
3 for more information on the channel adapters. Refer to Section 5 for 
more information about the disk storage units. 



Tape Drives and Controllers 



The IOS-E provides an interface to tape drives and controllers. (Cray 
Research does not sell tape drives or controllers.) The TCA-1 channel 
adapter in the IOS-E connects to IBM compatible magnetic tape drives 
and controllers. Refer to Section 3, "I/O Subsystem," for more 
information on the channel adapters. 
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Operator and Maintenance Workstations 



The MWS-E and OWS-E (Figure 1-2) are based on a Sun 4/370 
SPARCstation, 12-slot chassis. The SPARC (Scalable Processor 
ARChitecture) is a Sun version of the reduced instruction set computer 
(RISC) architecture. A VMEbus is provided in slots 4 through 12 of the 
workstations. 

Both workstations run the SunOS 4.1.1 operating system and 
Open Windows 2.0 software; the OWS-E also runs the OWS-E software 
release. The Sun operating system is an enhanced version of UNIX; it 
combines features of UNIX System Laboratories, Inc.'s System V UNIX 
and Berkeley Software Distribution's version 4.3 UNIX. 

The OWS-E is part of the Cray Research computer system. The MWS-E 
is owned by Cray Research; it enables Cray Research engineers to 
perform system maintenance independently of any customer activity on 
the Cray Research computer system. 
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MWS-E Functions 



Figure 1-2. MWS-E and OWS-E Workstation Chassis 



The MWS-E provides multiple connections for hardware maintenance 
and monitoring of the CRAY Y-MP C90 computer system. The MWS-E 
supports Cray Research diagnostics, enhanced diagnostic displays, code 
simulation, and maintenance and error channels. It monitors 
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environmental conditions and can shut down the system if severe 
variances occur. The MWS-E also serves as a platform for remote 
support, with customer approval. 



OWS-E Functions 



The OWS-E provides a dedicated workstation that Cray Research 
analysts and customer operators use to operate, administer, and monitor a 
Cray Research computer system. The OWS-E is also used for system 
boot, dump, and clear operations and for software support and upgrades. 



Network Interfaces 



The CRAY Y-MP C90 computer system is designed to communicate 
easily and efficiently with front-end computer systems and computer 
networks. 

Standard front-end interfaces (FEIs) connect the I/O channels of the 
IOS-E to front-end computer channels. These connections provide input 
data to the system and receive output from the system for distribution to 
peripheral equipment. An FEI compensates for differences in channel 
widths, machine word size, electrical logic levels, and control signals. 

Some FEIs are housed in a stand-alone cabinet located near the host 
computer; others are installed directly into the front-end computer 
system. Operation of the FEI is transparent to both the front-end 
computer users and Cray Research system users. 

An optional fiber-optic link (FOL-3 or FOL-4) is available for some FEIs 
to provide equipment separation distances of up to 13,120 ft (4,000 m). 
The FOL is installed between the IOS-E and FEI and provides complete 
electrical separation from the CRAY Y-MP C90 computer system. 

Refer to "Network Interfaces" in Section 5 for more information on 
network interfaces. 



Power and Cooling Support Equipment 



The logic modules that comprise the mainframe, IOS-E, and SSD-E 
require special equipment to supply electrical power and to remove heat. 
The following subsections define the power and cooling support 
equipment and the warning and control system (WACS) used with 
CRAY Y-MP C90 computer systems. Refer to Section 5, "Peripheral 
Equipment," for power and cooling requirements for peripheral devices. 
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Power Equipment 



Cooling Equipment 



Motor-generator sets (MGSs) and power supplies provide electrical 
power to the logic modules and are housed in a stand-alone cabinet. The 
MGS is typically located in a separate power equipment room. The 
power supplies are housed in the mainframe cabinet. 

An MGS uses power from commercial power mains to generate the 
proper voltage and frequency used by the mainframe power supplies. 
Customers must supply one of the following commercial power sources 
to the MGSs: 

• 460 Vac, 3 phase, 60 Hz or 
398 Vac, 3 phase, 50 Hz 

The MGSs supply 208-Vac, 400-Hz power to the power supplies. The 
MGSs also isolate the system from transients and fluctuations from the 
commercial power mains. The number of required MGSs varies among 
systems. Depending on the system configuration, a CRAY Y-MP C90 
computer system requires one or two MGSs. An optional standby MGS 
is available. 

The power supplies are housed in the mainframe chassis and 
IOS-E/SSD-E cabinet. The power supplies convert the 208-Vac, 400-Hz 
voltage to the necessary DC voltages required for the logic modules. 
The number and types of power supplies are not optional for the 
customer. 



The CRAY Y-MP C90 computer system uses two models of heat 
exchanger units (HEUs): HEU-C90 and HEU-E/S. The mainframe is 
connected to the HEU-C90, which has two pumps and two heat exchange 
subassemblies. The IOS-E/SSD-E cabinet is connected to the HEU-E/S, 
which contains a single pump and heat exchanger subassembly. The 
HEUs transfer heat between the dielectric coolant and the refrigerant. 

The CRAY Y-MP C90 computer system uses refrigeration condensing 
units RCU-5 and RCU-6. The RCUs dissipate the heat transferred from 
the HEUs. 

NOTE: The RCU-5, RCU-6, and MGS-6 are configured with the 

majority of CRAY Y-MP C90 computer systems. However, 
some CRAY Y-MP C90 computer systems use different support 
equipment. 
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An HEU and RCU, along with customer-supplied chilled water, cool the 
CRAY Y-MP C90 computer system. The RCU is typically located in a 
separate equipment power room. 

Figure 1-3 is a simplified diagram of the cooling system for the 
CRAY Y-MP C90 computer system. Cooling is accomplished by three 
systems: one or two closed-loop dielectric-coolant systems, one 
closed-loop refrigerant system, and a customer-supplied chilled water 
system. 



Mainframe 



Dielectric-coolant Path 




HEU-C90 



£±D_ 



Heat Exchange 
Subassembly 



Heat Exchange 
Subassembly 



t I 



Refrigerant 
Path 



RCU 



Chilled 
Water 



IOS-E/ 
SSD-E 



Modules 



r 



Power-supply 
Plates 



Dielectric-coolant Path 



— i 



HEU-E/S 



Heat Exchange 
Subassembly 



Refrigerant 
Path 



RCU 



Chilled 
Water 



Figure 1-3. CRAY Y-MP C90 Cooling System Block Diagram 
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Each closed-loop dielectric-coolant system contains a pump that 
circulates chilled dielectric coolant (such as Fluorinert Liquid) through 
each module and power-supply mounting plate. The dielectric coolant 
absorbs heat generated by the modules and power supplies. It then flows 
to a heat exchange subassembly, where heat transfers from the dielectric 
coolant to the closed-loop refrigerant system. 




WARNING 




Prevent dielectric coolant from coming in contact with 
excessive heat. Dielectric-coolant liquid can 
decompose and produce hazardous by-products when 
exposed to excessive heat. Avoid inhaling vapors from 
Fluorinert Liquid that may have been overheated or 
exposed to an open flame or electrical arcing. If toxic 
vapors are inhaled, move the victim to fresh air and call 
for medical assistance. 



The manual Safe Use and Handling of Fluorinert Liquids, Cray Research 
publication number HR-0306, provides specific guidelines and 
information regarding Fluorinert Liquid. 

The closed-loop refrigerant system contains a compressor that circulates 
the refrigerant. As previously mentioned, the refrigerant absorbs heat 
from the dielectric coolant. The refrigerant is then circulated through a 
condenser, where heat transfers to customer-supplied chilled water. 

Cray Research recommends a water-supply temperature of 
approximately 50 °F (10 °C). Other chilled water specifications, such as 
flow rate and pressure-drop values, vary with different system 
configurations and actual water-supply temperatures. Cray Research 
provides additional information on these specifications during the site 
planning process. 



Warning and Control System 



The mainframe chassis and the IOS-E/SSD-E cabinet each contain a 
warning and control system (WACS). The primary function of the 
WACS is to protect the equipment and environment from damage by 
continuously monitoring conditions such as temperature and pressure 
within the cabinet. The WACS can remove electrical power from the 
cabinet if warning or fault conditions exist. The secondary function of 
the WACS is to report the warning and fault conditions to the MWS-E. 
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The WACS consists of printed circuit boards and a system control panel. 
The WACS monitors the following conditions: 

• DC voltages - provides voltage-level monitoring, loss-of-voltage 
protection, and over-voltage protection 

• AC voltages - monitors phase-to-phase and phase-to-neutral 
voltages 

• Temperatures - monitors module and coolant temperatures to 
protect the mainframe from overheating 

• Pressure - provides protection against high- or low-coolant 
pressure in the cooling system 

• Coolant flow - monitors dielectric-coolant flow to protect the 
cabinet from overheating 

The WACS operates on a 120- or 220- Vac, 60-Hz power source, or a 
100- or 220- Vac, 50-Hz power source. This power source is separate 
from the MGS power source. 
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2 MAINFRAME 



This section describes the major functional areas and special features of 
the CRAY Y-MP C90 mainframe and provides a summary of the Cray 
Assembly Language (CAL) instruction set. A CRAY Y-MP C90 
mainframe specification sheet is included at the end of this section. 



CPU Shared Resources 



All central processing units (CPUs) in the CRAY Y-MP C90 mainframe 
share the following resources (refer to Figure 2-1): 



Central Memory 



Central memory 

I/O section 

Interprocessor communication section 

Real-time clock 



Central memory consists of random-access memory (RAM) that is 
shared by all the CPUs and the I/O section. Each memory word consists 
of 80 bits: 64 data bits and 16 error-correcting bits (check bits). Storage 
for data and check bits is provided by bipolar complementary metal 
oxide semiconductor (BiCMOS) chips. In order to improve memory 
access speed, central memory is divided into multiple banks that can be 
active simultaneously. 

In each CPU, the operating registers, instruction buffers, and exchange 
package have access to central memory through memory ports. Each 
CPU has four ports. Each of these ports is 2 words wide, allowing up to 
eight simultaneous memory references from each CPU. The I/O section 
shares one port in each CPU. 

The CRAY Y-MP C90 mainframe central memory uses a single-byte 
error correcting/double-byte error detecting (SBCDBD) memory 
error-correcting scheme instead of the single-error correction/ 
double-error detection (SECDED t) method used in previous Cray 
Research machines. SBCDBD ensures that data written into central 
memory is read with consistent precision. If a single byte (4 bits) of data 



Hamming, R. W. "Error Detection and Correcting Codes." Bell System Technical 
Journal. 29.2 (1950): 147-160. 
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Figure 2-1. CRAY Y-MP C90 Mainframe Block Diagram 
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is corrupted, the byte is automatically corrected when the word is read 
from memory. If 2 or more bytes are corrupted, then a double-byte error 
has occurred and can be detected, but not corrected. 



I/O Section 



All CPUs share the I/O section of the computer system. The computer 
system supports three channel types, which are identified by their 
maximum transfer rates: 

• Low-speed (LOSP) channels - 6 or 20 Mbytes/s 

• High-speed (HISP) channels - 200 Mbytes/s 

• Very high-speed (VHISP) channels - 1,800 Mbytes/s 



Interprocessor Communication Section 



The interprocessor communication section of the computer system 
contains shared and semaphore registers to pass data and control 
information between CPUs. It also contains logic to enable any CPU in 
monitor mode to interrupt any other CPU and cause it to switch from 
user to monitor mode. These features are especially useful in 
multitasking environments. 

The shared and semaphore registers are divided into identical groups 
called clusters. Each cluster contains eight 32-bit shared address (SB) 
registers, eight 64-bit shared scalar (ST) registers, and thirty-two 1-bit 
semaphore (SM) registers. Each CPU is assigned to one cluster, giving it 
access to the registers in that cluster. 

The shared registers function as intermediate storage between CPUs and 
provide a way to transfer data between operating registers in different 
CPUs. One CPU loads a shared register from its address or scalar 
registers; other CPUs assigned to the same cluster can then transfer the 
data from the shared register to their own address or scalar registers. 
Within a CPU, data is transmitted between the SB and address registers 
and between the ST and scalar registers. 

Semaphore (SM) registers allow a CPU to temporarily suspend program 
operation in order to synchronize operation with other CPUs. Each CPU 
can set or clear each SM register in its assigned cluster and can perform a 
test and set instruction on those SM registers. A test and set instruction 
can result in a CPU holding further execution of instructions until the 
appropriate SM register is cleared by another CPU assigned to the 
cluster. Each CPU in the cluster can also transmit all 32 SM registers to 
or from a scalar register. 



HR-04028 2-3 



Mainframe CRAY Y-MP C90 Functional Description Manual 



Real-time Clock 



The CRAY Y-MP C90 mainframe has a real-time clock (RTC) that 
increments synchronously with program execution and may be used to 
compute the running time for a program in clock periods (CPs). The 
RTC is a 64-bit counter that increments each CP. 



CPU Computation Section 



Each CPU is an identical, independent computation section consisting of 
operating registers, functional units, and an instruction control network 
(refer again to Figure 2-1). The operating registers and functional units 
store and process three types of data: address, scalar, and vector. 



Address data controls internal operations and consists of information 
such as memory addresses, register designators and indexes. Address 
data is stored in the address (A) registers and intermediate address (B) 
registers and is processed in two dedicated functional units. 

Scalar data is any discrete numerical quantity that can be processed in 
functional units either singly or in operand pairs to produce a single 
scalar result. Scalar data is stored in the scalar (S) registers and the 
intermediate scalar (T) registers and is processed in four dedicated 
functional units. Scalar floating-point data is processed in one of three 
floating-point functional units; these functional units are also used to 
process vector floating-point data. 

Vector data refers to a set (or vector) of discrete numerical quantities that 
can be referenced by a single name. Vector data can be processed either 
singly or in operand pairs in special functional units to produce a vector 
result. Practically speaking, this means that a single instruction can 
result in the same operation being performed sequentially on a whole set 
of operands to produce a set of results. Vector data is stored in the vector 
(V) registers and is processed in five dedicated functional units. Vector 
floating-point data is processed in one of three floating-point functional 
units; these functional units are also used to process scalar floating-point 
data. 

The 32-bit integer product is a vector instruction designed for index 
calculation. A full-indexing capability is possible throughout central 
memory in either scalar or vector modes. The index can be positive or 
negative in either mode. Indexing allows matrix operations in vector 
mode to be performed on rows or on the diagonal as well as allowing 
conventional column-oriented operations. 

Data flow in a computation section is from central memory to registers 
and from registers to functional units. Results flow from functional units 
to registers and from registers to central memory or back to functional 
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units. Depending on the instruction sequence, data flows along either the 
scalar or vector path with two exceptions. In some cases, the scalar 
registers may provide one of the required operands for some vector 
operations performed in the vector functional units. Also, some scalar 
functional units return their results to an address register. 

The computation section performs integer or floating-point arithmetic 
operations. Integer arithmetic is performed in two's complement mode; 
floating-point quantities have signed magnitude representation. 

Integer (or fixed-point) operations are integer addition, integer 
subtraction, and integer multiplication. No integer division instruction is 
provided; the operation is accomplished through a software algorithm 
using floating-point hardware. 

Floating-point instructions allow addition, subtraction, multiplication, 
and reciprocal approximation operations. The reciprocal approximation 
instructions used in conjunction with other instructions enable 
floating-point division operations. 

The instruction set includes logical operations for AND, inclusive OR, 
exclusive OR, exclusive NOR, and mask-controlled merge operations. 
Shift operations allow the manipulation of either 64-bit or 128-bit 
operands to produce 64-bit results. With the exception of 32-bit integer 
arithmetic performed in the A register functional units, most operations 
are used in vector or scalar instructions. 

The following subsections describe the operating registers and their 
associated functional units. 



Operating Registers 



Each CPU has three primary and two intermediate sets of operating 
registers. The primary sets of operating registers are the address (A), 
scalar (S), and vector (V) registers. These registers are considered 
primary because functional units and central memory can access them 
directly. 

For the A and S registers, an intermediate level of registers exists. The A 
registers are supported by the intermediate address (B) registers, and the 
S registers are supported by the intermediate scalar (T) registers. The B 
and T registers cannot access the functional units, and serve mainly as a 
memory buffer for the primary registers. To reduce the number of 
memory reference instructions for scalar and address operations, block 
transfers are possible between the B and T registers and central memory. 
The V registers do not have associated intermediate registers. 
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Address Registers 



Scalar Registers 



Each CPU contains eight 32-bit A registers. The A registers serve a 
variety of applications, but are primarily used as address registers for 
memory references and as index registers. They provide values for shift 
counts, loop control, and channel I/O operations and receive values of 
population count and leading zeros count. In address applications, A 
registers index the base address for scalar memory references and 
provide both a base address and an address increment for vector memory 
references. 

Each CPU contains 64 B registers; each register is 32 bits wide. The B 
registers are used as intermediate storage for the A registers. Data is 
transferred between B registers and central memory, and between A and 
B registers. Typically, B registers contain data to be referenced 
repeatedly over a long time span, making it unnecessary to retain the data 
in either A registers or central memory. Examples of data stored in B 
registers are loop counts, variable array base addresses, and dimensions. 

The data stored in B registers are protected with parity bits. When a 
word is written into a B register, a set of parity bits is generated and 
stored with the data bits. This set of parity bits is compared to another 
set that is generated when a word is read out of the B register. An error 
is indicated when the two sets do not match. Parity errors set the register 
parity error (RPE) flag in the exchange package if interrupt on register 
parity error (IRP) mode is set and enabled. They also report the location 
of the error to the status register. 



Each CPU contains eight S registers; each register is 64 bits wide. The S 
registers are the principal scalar registers for a CPU. Scalar registers 
serve as the source and destination of scalar arithmetic and logical 
instructions. Scalar registers can also provide an operand for some 
vector operations. 

Each CPU contains 64 T registers; each register is 64 bits wide. The T 
registers are used as intermediate storage for the S registers. Data is 
transferred between T registers and central memory, and between T and 
S registers. 

The data stored in T registers are protected with parity bits. When a 
word is written into a T register, a set of parity bits is generated and 
stored with the data bits. This set of parity bits is compared to another 
set that is generated when a word is read out of the T register. An error 
is indicated when the two sets do not match. Parity errors set the register 
parity error (RPE) flag in the exchange package if interrupt on register 
parity error (IRP) mode is set and enabled. They also report the location 
of the error to the status register. 
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Vector Registers 



Functional Units 



Each CPU contains eight V registers. Each V register contains 128io 
elements; each element can store 64 bits of data. In vector operations, 
the 128 elements are processed in two groups, called pipes. One pipe 
processes the even-numbered elements while the other pipe 
simultaneously processes the odd-numbered elements. Each pipe is 
supported by an identical set of functional units. 

The effective length of a V register for any operation is controlled by the 
program-selectable vector length (VL) register. The VL register is an 
8-bit register that specifies the number of vector elements processed by 
the vector instructions. The contents range from lg through 200s. 

The vector mask (VM) register allows for the logical selection of 
particular elements of a vector. The VM register is a 128-bit register; 
each bit corresponds to an element of a vector register. Bit 2 127 
corresponds to element and bit 2° corresponds to element 127. The 
mask is used with vector merge and test instructions to allow operations 
to be performed on individual vector elements. 

V register data is protected with parity bits. When a word is written into 
a V register, a set of parity bits is generated and stored with the data bits. 
This set of parity bits is compared to another set that is generated when 
the word is read out of the V register. An error is indicated when the two 
sets do not match. Parity errors set the register parity error (RPE) flag in 
the exchange package if interrupt on register parity error (IRP) mode is 
set and enabled. They also report the location of the error to the status 
register. 

For more information on vector processing, refer to "Vector Processing" 
in this section. 



Instructions other than simple transfers of data or control operations are 
performed by specialized hardware known as functional units. Each unit 
implements an algorithm or a portion of the instruction set. Most 
functional units have independent logic, and all can operate 
simultaneously. 

All functional units perform their specific operation in a fixed amount of 
time; delays are impossible once the operands are delivered to the unit. 
Functional units are fully segmented. This means a new set of operands 
for unrelated computation can enter a functional unit each CP, even 
though the functional unit time can be more than 1 CP. Refer to 
"Pipelining and Segmentation" and "Functional Unit Independence" in 
this section for more information on pipelining, segmentation, and 
functional unit independence. 
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Address Functional Units 



Scalar Functional Units 



There are four groups of functional units: address, scalar, vector, and 
floating-point. The address, scalar, and vector functional units operate 
with one of the primary register types (A, S, and V) to support address, 
scalar, and vector processing. The floating-point functional units support 
either scalar or vector operations and accept operands from or deliver 
results to S or V registers. For timing purposes, central memory can also 
act as a functional unit for vector operations. 

The following subsections define the functions and the instructions 
executed by each functional unit. Refer to the following sections and 
subsections for additional information on functional units. 



Address functional units perform integer arithmetic on operands obtained 
from A registers and deliver the results to an A register. Integer 
arithmetic is explained later in this section. The two address functional 
units are described below. 

• The address add functional unit performs integer addition and 
subtraction; subtraction is performed by using two's complement 
arithmetic. Overflow is not detected. 

• The address multiply functional unit forms an integer product from 
two operands. No rounding is performed, and overflow is not 
detected. The unit returns only the least significant 32 bits of the 
product. 



Scalar functional units perform operations on operands obtained from S 
registers and usually deliver the results to an S register. The exception is 
the population/parity/leading zero count functional unit, which delivers 
its result to an A register. 

Four functional units are exclusively associated with scalar operations 
and are described below. Three floating-point functional units are used 
for both scalar and vector operations. Refer to "Floating-point 
Functional Units" in this section for more information on these units. 

• The scalar add functional unit performs integer addition and 
subtraction; subtraction is performed by using two's complement 
arithmetic. Overflow is not detected. 

• The scalar logical functional unit performs bit-by-bit manipulation 
of quantities obtained from S registers. 
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The scalar shift functional unit shifts the entire contents of an S 
register (single shift) or shifts the contents of two concatenated S 
registers (double shift) into a single resultant S register. Single 
shifts are end-off with zero fill, while double shifts can be circular 
fill. Shift counts are obtained from an A register or from a field of 
the instruction. 

The scalar population/parity/leading zero count functional unit 
counts the number of 1 bits in an operand obtained from an S 
register and then, depending on the instruction issued, returns the 
count either as a population or population parity count to an A 
register. For the leading zero function, the unit counts the number 
of bits preceding the first 1 bit in an operand obtained from an S 
register and returns the count to an A register. 



Vector Functional Units 



There are two parallel sets of vector functional units referred to as pipe 
and pipe 1. Pipe processes the even-numbered elements of a vector, 
while pipe 1 processes the odd-numbered elements. This duplication of 
functional units allows two pairs of elements to be processed at the same 
time and increases the efficiency of the vector processing operations. 

Most vector functional units perform operations on operands obtained 
from one or two vector registers or from a vector register and an S 
register. The shift, population/parity, and leading zero functional units 
require only one operand. Results from a vector functional unit are 
delivered to a V register. 

The functional units described in this section are used exclusively for 
vector operations. Three functional units are associated with both vector 
operations and scalar operations. Refer to "Floating-point Functional 
Units" in this section for more information on these functional units. 

• The vector add functional unit performs integer addition and 
subtraction for a vector operation and delivers the results to 
elements of a V register. The subtraction operation uses two's 
complement arithmetic. Overflow is not detected. 

• The vector shift functional unit shifts the entire contents of a vector 
register element (single shift) or the value formed from two 
consecutive elements of a V register (double shift). Shift counts 
are obtained from an A register and are end-off with zero fill. 
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The full vector logical functional unit performs a bit-by-bit 
manipulation of specified quantities for specific instructions. The 
full vector logical functional unit also performs vector register 
merge, compressed index, and the logical operations associated 
with the vector mask instructions. 

The second vector logical functional unit, when enabled, performs 
the same type of bit-by-bit manipulations as the full vector logical 
functional unit, but not for all instructions. The second vector 
logical functional unit cannot perform vector register merge, 
compressed index, and the logical operations associated with the 
vector mask instructions. A bit in the exchange package enables or 
disables the second vector logical functional unit. 

The vector population/parity/leading zero count functional unit 
performs population counts, parity checks, and leading zero counts 
for vector operations. These operations are identical to those 
performed in the scalar population/parity/leading zero count 
functional unit, except that the operands are the elements of a V 
register, and the results are returned to a V register. 



Floating-point Functional Units 



There are two parallel sets of floating-point functional units, with each 
set containing three functional units. These floating-point functional 
units perform floating-point arithmetic for both scalar and vector 
operations. The vector registers use both sets of functional units; one set 
processes the even-numbered elements, while the other set processes the 
odd-numbered elements. For an operation involving only scalar 
operands, only one set of floating-point functional units is used. 

When executing most vector instructions, operands are obtained from 
pairs of V registers, or from an S register and a V register, and results are 
delivered to a vector register. When a floating-point functional unit is 
used for a vector operation, the general description of vector functional 
units applies. When executing a scalar instruction, operands are obtained 
solely from S registers, and results are delivered to an S register. 

• The floating-point add functional unit performs addition and 
subtraction of operands in floating-point format. The result is 
normalized even when operands are unnormalized. The 
floating-point add functional unit detects overflow and underflow 
conditions; only overflow conditions are flagged. 
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Functional Unit Operations 



The floating-point multiply functional unit performs full- and 
half-precision multiplication of operands in floating-point format. 
The half-precision product is rounded; the full-precision product 
can be rounded or not rounded. This functional unit also generates 
a 32-bit integer product. 

Input operands must be normalized; the floating-point multiply 
functional unit delivers a normalized result only if both input 
operands are normalized. The floating-point multiply functional 
unit detects overflow and underflow conditions; only overflow 
conditions are flagged. 

The floating-point multiply functional unit recognizes both 
operands with zero exponents as a special case and performs an 
integer multiply operation. The result is considered an integer 
product, is not normalized, and is not considered out of range. 

The reciprocal approximation functional unit finds the approximate 
reciprocal of an operand in floating-point format. The input 
operand must be normalized; the floating-point reciprocal 
approximation functional unit delivers a correct result only if the 
input operand is normalized. The high-order bit of the coefficient 
is not tested, but is assumed to be a 1. The floating-point 
reciprocal approximation functional unit detects overflow and 
underflow conditions; both conditions are flagged. 



Functional units in a CPU perform logical operations, integer arithmetic, 
and floating-point arithmetic. Integer arithmetic and floating-point 
arithmetic are performed in two's complement. The following 
subsections explain the logical operations, the integer arithmetic, and the 
floating-point arithmetic used by the CRAY Y-MP C90 mainframe. 



Logical Operations 



Scalar and vector logical functional units perform bit-by-bit 
manipulation of 64-bit quantities. Instructions are provided for forming 
logical products, sums, exclusive ORs, equivalences, and merges. 

A logical product is the AND function, which is shown in the following 
example: 

Operandi: 1010 
Operand 2: 1100 
Result: 10 
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A logical sum is the inclusive OR function, which is shown in the 
following example: 

Operandi: 1010 
Operand 2: 1100 
Result : 1110 

A logical exclusive OR function is shown in the following example: 

Operandi: 1010 
Operand 2: 1100 
Result: 0110 

A logical equivalence is the exclusive NOR function, which is shown in 
the following example: 

Operandi: 1010 
Operand 2: 1100 
Result: 10 01 

The merge operation uses two operands and a mask to produce results. 
The bits of operand 1 are transmitted to the result when the mask bit is a 
1. The bits of operand 2 are transmitted to the result when the mask bit 
is a 0. The following example shows a merge operation: 

Operandi: 10101010 

Operand 2: 11001100 

Mask: 11110000 

Result: 1010110 



Integer Arithmetic 



All integers, whether 32 or 64 bits long, are represented in the registers 
as shown in Figure 2-2. The address add and address multiply functional 
units perform 32-bit arithmetic. The scalar add and vector add functional 
units perform 64-bit arithmetic. 

Two scalar (64-bit) integer operands are multiplied using the 
floating-point multiply instruction and one of two multiplication 
methods. The method used depends on the magnitude of the operands 
and the number of bits available to contain the product. The following 
paragraphs explain the 24-bit integer multiply operation and the method 
used for operands greater than 24 bits. 

The floating-point multiply functional unit recognizes a condition in 
which both operands have zero exponents as a special case. This case is 
treated as an integer multiplication operation, and a complete 
multiplication operation is performed with no truncation as long as the 
total number of bits in the two operands does not exceed 48 bit positions. 
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To multiply two integer numbers together, set each operand's exponent 
(bits 2 62 through 2 48 ) equal to and place each 24-bit integer value in bit 
positions 2 47 through 2 24 of the operand's coefficient field. To ensure 
accuracy, the least significant 24 bits must be O's. 
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Figure 2-2. Integer Data Formats 



When the floating-point multiply functional unit performs the operation, 
it returns the 48 high-order bits of the product as the result coefficient 
and leaves the exponent field as 0. The result is a 48-bit quantity in bit 
positions 2 47 through 2°; no normalization shift of the result is 
performed. If the 24 least significant bits of the operand coefficients 
were nonzero, the 48 low-order bits of the product could be nonzero and 
could generate a carry into the least significant of the 48 high-order bits 
returned, causing the result to be one larger than expected. 

As shown in Figure 2-3, if operand 1 is 4s and operand 2 is 6s, a 48-bit 
result of 308 is produced. Bit 2 63 follows the rules for multiplying signs, 
and the result is a signed-magnitude integer. An exclusive OR function 
on bits 2 63 of operands 1 and 2 is performed to derive the sign of the 





263 2 6 2 


248 2 47 


224 2 2 3 2° 


Operand 1 










o 


04 


Must be to Ensure Correct Product 




263 2 fi 2 


248 2 4 7 


224 2 2 3 2° 


Operand 2 




f\ .,. r\ 


n nr, 


Must be to Ensure Correct Product 


U 


\j 






263 2 6 2 


248 2 47 


2° 


Result 




n n 


n 


nnn 







A-10373 



Figure 2-3. 24-bit Integer Multiply Performed in a Floating-point Multiply Functional Unit 
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result. The format of integers expected by both the hardware and 
software is two's complement, not signed-magnitude; therefore, negative 
products must be converted to two's complement form. 

The second multiplication method is used when the operands are more 
than 24 bits long; multiplication is done by software, which forms 
multiple partial products and then shifts and adds the partial products. 

A second integer multiplication operation performs a 32-bit 
multiplication operation on the S/ operand and the Vk operand and puts 
the result in the Vi register. The operands must be shifted left before the 
operation begins. The S/ operand must be shifted left 31io places, 
leaving the operand in bit positions 2 62 through 2 31 ; bit positions 2 30 
through 2° must be equal to to ensure accuracy (refer to Figure 2-4). 
The Vk operand must be shifted left \6\q places, leaving the operand in 
bit positions 2 47 through 2 16 ; bit positions 2 15 through 2° must be equal 
to to ensure accuracy. Bits 2 63 through 2 48 are zero filled. The result 
of the multiply is right justified into positions 2 31 through 2°, and 
positions 2 32 through 2 63 are zero filled. 
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Figure 2-4. 32-bit Integer Multiply Performed in a Floating-point Multiply Functional Unit 



Although no integer division operation is provided, integer division can 
be carried out by converting the numbers to the floating-point format and 
then using the floating-point functional units. For more information on 
integer division, refer to "Floating-point Division Algorithm" in this 
section. 



Floating-point Arithmetic 



The scalar and vector instructions use floating-point arithmetic. The 
following subsections explain floating-point arithmetic. 
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Floating-point Data Format 
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Floating-point numbers are represented in a standard format throughout 
the CPU; this format is shown in Figure 2-5. The format has three fields: 
coefficient sign, exponent, and coefficient. 
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Figure 2-5. Floating-point Data Format 



This format is a packed representation of a binary coefficient and an 
exponent (power of two). The coefficient sign is located in bit position 
2 63 and is separated from the rest of the coefficient. If this bit is equal to 
0, the coefficient is positive; if this bit is equal to 1, the coefficient is 
negative. 

The exponent is represented as a biased integer number in bit positions 
2 62 through 2 48 ; each exponent is biased by 400008- Figure 2-6 shows 
the biased and unbiased exponent ranges. Bit 2 61 is the sign of the 
exponent; a indicates a positive exponent, and a 1 indicates a negative 
exponent. Bit 2 62 is the bias of the exponent. The floating-point format 
of the system allows the accurate expression of numbers to about 15 
decimal digits in the approximate range of 10~ 2466 through 10 +2466 . 

The coefficient is a 48-bit signed fraction; the sign of the coefficient is 
located in bit position 2 63 . Because the coefficient is in 
signed-magnitude format, it is not complemented for negative values. 



20000 8 - 
-20000 8 - 



Negative Range 
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Biased Exponent Range 

+- 40000 8 «* 

». 



Positive Range 
Positive Range 



Unbiased Exponent Range 
Figure 2-6. Biased and Unbiased Exponent Ranges 
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Figure 2-7 and the following steps show the relation between the biased 
exponent and the coefficient. The following steps show how to convert 
a floating-point number to its decimal equivalent. 



Binary Point 
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Figure 2-7. Internal Representation of a Floating-point Number 

1. Subtract the bias from the exponent to get the integer value of the 
exponent: 

400118 
- 400008 

lis =9io 

2. Multiply the normalized coefficient by the power of 2 indicated in 
the exponent to get the result: 



0.5634 8 x 2 9 = 563.40 8 = 371.5 



10 



A zero value or an underflow result is not biased and is represented as a 
word of all 0's. A negative is not generated by any floating-point 
functional unit, except in the case in which a negative is one operand 
going into the floating-point multiply or floating-point add functional 
unit. 



Normalized Floating-point Numbers 



A nonzero floating-point number is normalized if the most significant bit 
of the coefficient (bit 2 47 ) is nonzero. This condition implies that the 
coefficient has been shifted as far left as possible and that the exponent 
has been adjusted accordingly; therefore, a normalized floating-point 
number has no leading 0's in its coefficient. The exception is a 
normalized floating-point 0, which is all 0's. 

Anytime an integer is converted to a floating-point number, normalize 
the result before using it in a floating-point operation. Normalization is 
accomplished by adding the unnormalized floating-point operand to 0. 
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Floating-point Range Errors 



The reciprocal approximation functional unit must use normalized 
numbers to produce correct results. Using unnormalized numbers 
produces inaccurate results. 

The floating-point multiply functional unit does not require the use of 
normalized numbers to get correct results. However, more accurate 
results occur when normalized numbers are used. 

The floating-point add functional unit does not require normalized 
numbers to get correct results. The floating-point add functional unit 
does, however, automatically normalize all its results; unnormalized 
floating-point numbers may be routed through this functional unit to take 
advantage of this process. 



To ensure that the limits of the functional units are not exceeded, a range 
check is performed for overflow and underflow conditions on the 
exponent of each floating-point number coming into the functional unit. 
Bits 2 61 and 2 62 are checked; if both bits are equal to 1, the exponent is 
equal to or greater than 60000s, and an overflow condition is detected. 

When an overflow condition is detected, an interrupt occurs only if the 
interrupt-on-floating-point error (IFP) mode is set and enabled. In this 
case, the floating-point error (FPE) flag is set, causing an exchange 
sequence to occur. The IFP mode can be set or cleared by a user mode 
program. 

When an overflow condition occurs, the value returned to the result 
register depends on the functional unit used. For the floating-point add 
and floating-point multiply functional units, the calculated coefficient, 
together with a forced exponent of 60000s, is sent to the result register. 
For the reciprocal approximation functional unit, the returned result is 
the same except that bit 2 47 of the coefficient is set to 0. Refer to 
Figure 2-8 and Figure 2-9. 



Sign Exponent Coefficient, No Flag Set 

Figure 2-8. Floating-point Add and Floating-point Multiply Range Errors 
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To check for an underflow condition in the floating-point functional 
units, bits 2 61 and 2 62 are checked; if both are equal to 0, then the 
exponent is less than or equal to 17777s, ana * an underflow condition is 
detected. 

If an underflow condition is detected in the floating-point add or 
floating-point multiply functional unit, no fault is generated, and the 
word returned from the functional unit is all bits. Refer to Figure 2-8. 
The floating-point multiply functional unit will not detect an underflow 
condition if both exponents equal 0; instead an integer multiply operation 
is performed. 

Because the underflow condition of the result generated by the 
floating-point add functional unit is tested before the result is 
normalized, the normalized result can have a valid exponent as low as 
17721s- This occurs when the unnormalized result has an exponent of 
200008 and a coefficient of 1. In this case, no underflow is detected, and 
the calculated result is sent to the result register. 

An underflow condition is detected in the reciprocal approximation 
functional unit if either of the incoming operands has an exponent less 
than or equal to 20001s. If this condition occurs, the FPE flag will set 
only if IFP mode is set and enabled. The calculated coefficient, with bit 
2 47 set to 0, together with a forced exponent of 60000s, is sent to the 
result register. Refer to Figure 2-9. 
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Figure 2-9. Floating-point Reciprocal Approximation Range Errors 
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Floating-point Addition Algorithm 



Floating-point addition or subtraction is performed in a 49-bit register to 
allow for a sum that might carry into an additional bit position. The 
algorithm performs three operations: equalizing exponents, adding 
coefficients, and normalizing results. 
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To equalize the exponents, only the larger of the two exponents is 
retained. The coefficient of the smaller exponent is shifted right by the 
difference of the two exponents. Bits shifted out of the register are lost; 
no roundup occurs. Because the coefficient is only 48 bits long, any shift 
beyond 48 bits causes the smaller coefficient to become 0. 

After the two coefficients are equalized, they are added together. Two 
conditions are analyzed to determine whether an addition or subtraction 
operation occurs. The two conditions are the sign bits of the two 
coefficients and the type of instruction (an add or subtract) issued. The 
following list shows how the operation is determined. 

• If the sign bits are equal and an add instruction is issued, an 
addition operation is performed. 

• If the sign bits are not equal and an add instruction is issued, a 
subtraction operation is performed. 

• If the sign bits are equal and a subtract instruction is issued, a 
subtraction operation is performed. 

• If the sign bits are not equal and a subtract instruction is issued, an 
addition operation is performed. 

The last operation performed normalizes the results. To normalize the 
result, the coefficient is shifted left by the number of leading 0's (the 
coefficient is normalized when bit 2 47 is a 1). The exponent must also be 
decremented accordingly. If a carry across the binary point occurs 
during an addition operation, the coefficient is shifted right by 1 and the 
exponent increases by 1. 

The normalization feature of the floating-point add functional unit is 
used to normalize any floating-point number. The number is simply 
paired with a zero operand and sent through the floating-point add 
functional unit. 

A range check is performed on the result of all additions; refer to 
"Floating-point Range Errors" earlier in this subsection for more 
information on how the result is checked. 



Floating-point Multiplication Algorithm 

The floating-point multiply functional unit receives two 48-bit 
floating-point coefficients from either an S or V register as input. 
Multiplication is commutative, that is, A x B = B x A. The signs of the 
two operands are combined by an exclusive OR function, the exponents 
are added together, and the two 48-bit coefficients are multiplied 
together. Multiplying the 48-bit coefficients produces a product of either 
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95 or 96 bits. A 96-bit product is normalized as it is generated, but a 
95-bit product requires a left shift of 1 to generate the final normalized 
coefficient. If a shift occurs, the final exponent is reduced by 1 to reflect 
the shift. 

Because the result register (an S or V register) can hold only 48 bits in 
the coefficient, only the upper 48 bits of the 96-bit result are used. Some 
of the lower 48 bits are never generated. To adjust for this truncation, a 
constant is unconditionally added to the product. The average value of 
this truncation is 9.25 x 2" 56 , which is determined by adding all carries 
produced by all possible combinations that could be truncated and 
dividing the sum by the number of possible combinations. Nine carries 
are inserted at bit position 2~ 56 to compensate for the truncated bits. 

If the truncated bits are not compensated for, the resulting coefficient is 1 
bit position smaller than expected. With compensation, the resulting 
coefficient ranges from 1 too large to 1 too small in the 2~ 48 bit position, 
with approximately 99% of the values having zero deviation from what 
would have been generated had a full 96-bit product been present. 
Rounding is optional, but truncation compensation is not. The rounding 
method used adds a constant so that it is 50% high (0.25 x 2~ 48 ; high) 
38% of the time, and 25% low (0.125 x 2" 48 ; low) 62% of the time, 
resulting in a near-zero average rounding error. In a full-precision 
rounded multiplication operation, 2 round bits are entered into the 
summation at bit positions 2" 50 and 2" 51 and allowed to propagate. 

For a half-precision multiplication operation, round bits are entered into 
the summation at bit positions 2" 32 and 2" 31 . A carry resulting from this 
entry is allowed to propagate upward, and the 29 most significant bits of 
the normalized result are transmitted back. 

The result variations caused by this truncation and rounding are in the 
following ranges: 

-0.23 x 2" 48 to +0.57 x 2" 48 

or 

-8.17 x 10~ 16 to +20.25 x 10" 16 

With a 96-bit product and rounding equal to one-half the. least significant 
bit, the following result variation is expected: 

-0.5 x2" 48 to +0.5 x2~ 48 
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Floating-point Division Algorithm 



The CRAYY-MP C90 mainframe does not have a single functional unit 
dedicated to the division operation. Rather, the floating-point multiply 
and reciprocal approximation functional units together carry out the 
algorithm. The following paragraphs explain the algorithm and how it is 
used in the functional units. 

Finding the quotient of two floating-point numbers involves two steps. 
For example, to find the quotient A/B, first the B operand is sent through 
the reciprocal approximation functional unit to obtain its reciprocal, 1/B. 
Then, this result along with the A operand is sent to the floating-point 
multiply functional unit to obtain the product A x 1/B. 

The reciprocal approximation functional unit uses an application of 
Newton's method for approximating the real root of an arbitrary 
equation, F(x) = 0, to find reciprocals. Refer to Figure 2-10. 

To find the reciprocal, the equation F(x) = 1/x - B = must be solved. 
To do this, a number, A, must be found so that F(A) = 1/A - B = 0. That 
is, the number A is the root of the equation 1/x - B = 0. The method 
requires an initial approximation (or guess, which is shown as xo in 
Figure 2-10) sufficiently close to the true root (which is shown as x t in 
Figure 2-10). The initial approximation, xo, is then used to obtain a 
better approximation; this is done by drawing a tangent line (line 1 in 
Figure 2-10) to the graph of y = F(x) at the point [xo, F(xo)]. The 
x-intercept of this tangent line becomes the second approximation, xi. 
This process is repeated, using tangent line 2 to obtain X2, and so on. 

The following iteration equation is derived from this process: 

X( i+ i) = 2xi - xj 2 B = xi (2 - XiB) 

In the equation, X(j+i) is the next iteration, x, is the current iteration, and 
B is the divisor. Each X(j+i) is a better approximation than x, to the true 
value, x t . The exact answer is generally not obtained at once because the 
correction term is not exact. The operation is repeated until the answer 
becomes sufficiently close for practical use. 
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Figure 2-10. Newton's Method for Approximating Roots 



The CRAY Y-MP C90 mainframe uses this approximation technique 
based on Newton's method. A hardware look-up table provides an initial 
guess, xo, accurate to within 8 bits, to start the process. The following 
iterations are then calculated. 



Iteration Operation 

1 xi = x (2 - x B) 



Description 

The first approximation is done 
in the reciprocal approximation 
functional unit and is accurate to 
16 bits. 



2 x 2 = xi(2 - xiB) 



The second approximation is 
done in the reciprocal 
approximation functional unit 
and is accurate to 30 bits. 



3 x 3 = x 2 (2 - x 2 B) 



The third approximation is done 
in the floating-point multiply 
functional unit to calculate the 
correction term. 
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The reciprocal approximation functional unit calculates the first two 
iterations, while the floating-point multiply functional unit calculates the 
third iteration. The third iteration uses a special instruction within the 
floating-point multiply functional unit to calculate the correction term. 
This iteration is used to increase accuracy of the reciprocal 
approximation functional unit's answer to full precision. The 
floating-point multiply functional unit can provide both full- and 
half-precision results. 

The reciprocal iteration is designed for use once with each half -precision 
reciprocal generated. If the third iteration (the iteration performed by the 
floating-point multiply functional unit) results in an exact reciprocal, or 
if an exact reciprocal is generated by some other method, performing 
another iteration results in an incorrect final reciprocal. A fourth 
iteration should not be done. 

The following example shows how the floating-point multiply functional 
unit provides a full-precision result, computing the value of S1/S2. 

Step Operation Unit 

1 S3 = 1/S2 Reciprocal approximation 

functional unit 

2 S4 = [2 - (S3 * S2)] Floating-point multiply 

functional unit 

3 S5 = S4 * S3 Floating-point multiply 

functional unit using 
full-precision; S5 now equals 
1/S2 to 48-bit accuracy 

4 S6 = S5 * SI Floating-point multiply 

functional unit using 
full-precision rounding 

The reciprocal approximation in Step 1 is correct to 30 bits. By Step 3, it 
is accurate to 48 bits. This iteration answer is applied as an operand in a 
full-precision rounded multiplication operation (Step 4) to obtain a 
quotient accurate to 48 bits. Additional iterations may produce 
erroneous results. 
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Where 29 bits of accuracy are sufficient, the reciprocal approximation 
instruction is used with the half -precision multiply to produce a 
half-precision quotient in only two operations, as shown in the following 
example. 

Step Operation Unit 

1 S3 = 1/S2 Reciprocal approximation 

functional unit 

2 S6 = SI * S3 Floating-point multiply 

functional unit in half-precision 

The 19 low-order bits of the half -precision multiply results are returned 
as O's with rounding applied to the low-order bit of the 29-bit result. 

The following is another method of performing the division operation: 

Step Operation Unit 

1 S3 = 1/S2 Reciprocal approximation 

functional unit 

2 S5 = SI * S3 Floating-point multiply functional 

unit 

3 S4 = [2 - (S3 * S2)] Floating-point multiply functional 

unit 

4 S6 = S4 * S5 Floating-point multiply functional 

unit 

With this method, the correction to reach a full-precision reciprocal is 
done after the numerator is multiplied by the half-precision reciprocal 
rather than before the multiplication. 

The coefficient of the reciprocal produced by this alternative method can 
differ by as much as 2 x 2 -48 from the first method described for 
generating full-precision reciprocals. This difference can occur because 
one method can round up as much as twice, while the other method may 
not round at all. The first rounding can occur while the correction is 
generated, and the second rounding can occur when the final quotient is 
produced. Therefore, the reciprocals should be compared using the same 
method each time they are generated. Cray Fortran CFT and CFT77 use 
a consistent method to ensure that the reciprocals of numbers are always 
the same. 
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Double-precision Numbers 



The CPU does not provide special hardware for performing double- or 
multiple-precision operations. Double-precision computations with 
95-bit accuracy are available through software routines provided by Cray 
Research. 



CPU Control Section 



Each central processing unit (CPU) is assigned tasks and is controlled in 
the execution of those tasks through exchange sequences, fetch 
sequences, and issue sequences. These three sequences are closely 
related. For an initial deadstart program or a new program to run, an 
exchange sequence must occur. This sequence of steps sets several 
important parameters of the program in the CPU and may initialize some 
of the CPU's operating registers. A fetch sequence begins immediately 
after the exchange sequence and transfers a block of instructions from 
memory to an instruction buffer. The issue sequence then selects the 
instruction indicated by the program address (P) register, decodes it, 
determines whether the required registers or functional units are 
available, and if so, allows the instruction to be executed. 

As the instruction executes, the P register increments, causing new 
instructions to be selected from an instruction buffer and to move 
through the issue sequence. When a desired instruction is not currently 
in an instruction buffer, another fetch sequence occurs, retrieving another 
block of instructions from memory. This overall process continues until 
either the program terminates or is interrupted, at which time another 
exchange sequence occurs and the entire process begins again. 



The following subsections describe the exchange mechanism, the 
instruction fetch sequence, and the instruction issue sequence unique to 
each CPU. The programmable clock, the status register, and the 
performance monitor are also briefly described. 



Exchange Mechanism 



Each CPU uses an exchange mechanism for switching instruction 
execution from program to program. This exchange mechanism 
transfers blocks of program parameters (known as exchange packages) 
during a CPU operation, which is referred to as an exchange sequence. 

The following subsections describe the contents of the exchange package 
and explain the exchange sequence in more detail. 
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Exchange Sequence 



Exchange Package 



The exchange sequence moves the contents of an inactive exchange 
package from memory into the operating registers. Simultaneously, the 
exchange sequence retrieves data from the operating registers, uses it to 
construct the active exchange package, and then moves this exchange 
package back into memory. This swapping operation occurs in a fixed 
sequence when all computational activity associated with the active 
exchange package stops. 

The exchange sequence involves 16 memory read references and 16 
memory write references. A single 16-word block of memory data is 
used as the source of the inactive exchange package and the destination 
of the active exchange package. Word of the active exchange package 
is swapped with word of the inactive exchange package. The location 
of this block of data is specified by the contents of the XA register and is 
part of the active exchange package. 



An exchange package is a 16-word block of data stored in a reserved 
area of memory that contains the initial parameters for a particular 
computer program. In addition to initializing the program, these 
parameters also provide continuity if a program stops and restarts 
processing from one section of the program to the next. 

The exchange package includes the contents of the address (A) and 
scalar (S) registers. The contents of the intermediate address (B), 
intermediate scalar (T), vector (V), vector mask (VM), shared B (SB), 
shared T (ST), and semaphore (SM) registers are not saved in the 
exchange package. Data in these registers must be stored and replaced as 
required by the program supervising the object program or by any 
program that needs this data. 



Program Address Register Field 



The program address (P) register contents are stored in the program 
address register field of the exchange package. There are 32 bits in the P 
register, the lower 2 bits of which are used to select a particular 16-bit 
parcel of a memory word. The P register is wide enough to address 1 
gigaword of memory in C90 mode and 4 Mwords of memory in Y-MP 
mode. 

The address stored in the P register field is the address of the first 
instruction that issues when the program that corresponds to this 
exchange package executes. 



2-26 



HR-04028 



CRAYY-MP C90 Functional Description Manual Mainframe 



Instruction Base Address Register Field 

The instruction base address (IBA) register holds the base address of the 
user's instruction area (the location in memory where a program's 
instruction area begins). The absolute memory address for an instruction 
fetch sequence is formed by adding the contents of the IBA register to 
the 30 high-order bits of the contents of the P register. 



Instruction Limit Address Register Field 

The instruction limit address (ILA) register holds the limit address of the 
program's memory image area, which is used to determine the highest 
absolute memory address that can be accessed during an instruction fetch 
sequence. 

The absolute memory address used in an instruction fetch sequence must 
be an address between the IBA and ILA specified for the program being 
executed, or a program range error occurs. If the interrupt-on- 
program-range-error (IPR) mode is set in the exchange package, this 
error sets the program-range-error (PRE) interrupt flag. Regardless of 
the state of the IPR mode, a CPU interrupt will occur. 



Data Base Address Register Field 



The data base address (DBA) register holds the base address of the user's 
data area (the location in memory where a program's data area begins). 
Each time an instruction in the program makes a memory reference, the 
memory address generated by the instruction is added to the DBA to 
form the absolute memory address. 



Data Limit Address Register Field 



The data limit address (DLA) register holds the limit address of the 
user's data area, which is used to determine the highest absolute memory 
address the program can use for reading or writing data. 

Each time an instruction makes a memory reference, the absolute 
memory address generated is compared to the DLA and the DBA. The 
absolute memory address must be between the DBA and the DLA, or an 
operand range error occurs. If the interrupt-on-operand-range-error 
(IOR) mode is set in the exchange package, this error sets the 
operand-range-error (ORE) interrupt flag, causing a CPU interrupt. 

An instruction that attempts to read from a memory address outside the 
limits of the DBA and DLA still issues and finishes, but a zero value is 
transferred from memory. An instruction that attempts to write to a 
memory address ouside these limits issues, but no write operation occurs. 
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Interrupt Modes Field 



There are 16 user-selectable interrupt modes, which allow the 
programmer to select the conditions under which the active program can 
be interrupted. These modes are usually selected in the exchange 
package, and with the exception of IPR, FEX, and FNX, they must be 
enabled by setting the EIM (enable interrupt modes) flag. The EIM flag 
sets automatically on an exchange to non-monitor mode and clears on an 
exchange back to monitor mode. While in monitor mode, the EIM flag 
can be set or cleared by instructions 001302 or 001303, respectively. 

The interrupt modes are explained briefly in Table 2-1. 



Table 2-1. CRAY Y-MP C90 Interrupt Modes 


Mode 


Description 


IRP 


Enables an interrupt if a register parity error is detected while reading data from a 
register. 


IUM 


Enables an interrupt if an uncorrectable memory error is detected while reading data 
from memory. 


IFP 


Enables an interrupt if a floating-point error occurs. 


IOR 


Enables an interrupt if an operand range error occurs. 


IPR 


Enables the PRE interrupt flag to set if a program range error occurs. A program 
range error always causes an exchange, regardless of the state of IPR. This mode 
is not affected by the EIM flag. 


FEX 


Enables the EEX interrupt flag to set if an error exit instruction (000000) issues. 
Issuing an error exit instruction always causes an exchange, regardless of the state 
of FEX. This mode is not affected by the EIM flag. 


IBP 


Enables an interrupt if a breakpoint occurs. 


ICM 


Enables an interrupt if a correctable memory error is detected while reading data 
from memory. 


IMC 


Enables an interrupt if requested by the maintenance control unit (MCU). The MCU 
for the CRAY Y-MP C90 is the MWS-E. 


IRT 


Enables an interrupt if requested by the real-time clock. 


IIP 


Enables an interprocessor interrupt if requested by another CPU. 


110 


Enables an I/O interrupt if SIE is set and this CPU is the lowest-numbered CPU with 
110=1 and EIM=1. 


IPC 


Enables an interrupt if requested by the programmable clock. 


IDL 


Enables an interrupt if a deadlock occurs while the program is not in monitor mode. 
IDL has no effect in monitor mode. 
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Table 2-1. CRAY Y-MP C90 Interrupt Modes (continued) 


Mode 


Description 


IMI 


Enables an interrupt if a monitor mode instruction (001 ijk\ j*0) issues while the 
program is not in monitor mode. IMI has no effect in monitor mode. 


FNX 


Enables the NEX interrupt flag to set if a normal exit instruction (004000) issues. 
Issuing a normal exit instruction always causes an exchange, regardless of the state 
of FNX. This mode is not affected by the EIM flag. 



Interrupt Flags Field 



There are 16 interrupt flags, with one flag corresponding to each of the 
16 user-selectable interrupt modes. If a particular interrupt mode (except 
IPR, FEX, or FNX) is set and enabled and the specified error occurs, the 
corresponding interrupt flag is set, forcing an exchange. If the error 
occurs while the appropriate interrupt mode is set but not enabled, the 
interrupt is held. This condition can occur only while the program is in 
monitor mode. Enabling the interrupt modes, either by exchanging to 
user mode or by issuing instruction 001302, enables the held interrupt to 
be processed, at which time it sets the corresponding interrupt flag and 
forces an exchange. 

All interrupts or held interrupts, except PCI and ICP, are cleared on any 
exchange. PCI and ICP interrupts are held until they are cleared by 
instruction 001405 or 001402, respectively. 

Two interrupt flags, deadlock (DL) and monitor instruction interrupt 
(Mil), will set only if the corresponding interrupt modes are set and if the 
program is in non-monitor mode when the error occurs. 

The I/O interrupt (IOI) flag sets only if the system I/O interrupts enabled 
(SIE) flag is set and if the CPU to be interrupted is the lowest-numbered 
CPU with IIO interrupt mode set and enabled. The SIE flag can be set 
by any CPU issuing instruction 001600. After any CPU is interrupted by 
an I/O interrupt, this flag is cleared, disabling all I/O interrupts. The 
interrupted CPU resets the SIE flag by issuing instruction 001600 after it 
has serviced the I/O interrupt. 

Three errors always cause an exchange, regardless of the status of the 
EIM flag: a program range error, issuing instruction 000000, or issuing 
instruction 004000. The interrupt modes specifying these errors (IPR, 
FEX, and FNX) are used solely to enable setting the corresponding 
interrupt flags (PRE, EEX, and NEX respectively) should the appropriate 
error occur. Setting an interrupt flag in these cases makes it easier to 
determine the source of the error. 

The errors that set interrupt flags are explained briefly in Table 2-2. 
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Table 2-2. CRAY Y-MP C90 Interrupt Flags 


Flag 


Description 


RPE 


The register parity error flag sets if the IRP interrupt mode bit is set and enabled and 
a parity error occurs during a read operation from a B, T, V, SB, or ST register or 
from an instruction buffer. 


MEU 


The memory error (uncorrectable) flag sets if the IUM interrupt mode bit is set and 
enabled and an uncorrectable memory error occurs while reading data from memory. 


FPE 


The floating-point error flag sets if the IFP interrupt mode is set and enabled and a 
floating-point range error occurs in any of the floating-point functional units. 


ORE 


The operand range error flag sets if the IOR interrupt mode bit is set and enabled 
and a data reference is made outside the address boundaries specified in the DBA 
and DLA registers. 


PRE 


The program range error flag sets if the IPR interrupt mode bit is set and enabled 
and an instruction fetch is made outside the address boundaries specified in the IBA 
and ILA registers. A program range error always causes an exchange, regardless of 
the state of IPR. 


EEX 


The error exit flag sets if the FEX interrupt mode bit is set and enabled and an error 
exit instruction (000000) issues. Issuing an error exit instruction always causes an 
exchange, regardless of the state of FEX. 


BPI 


The breakpoint interrupt flag sets if the IBP interrupt mode bit is set and enabled and 
a write reference is made to an address within the breakpoint range. 


MEC 


The memory error (correctable) flag sets if the ICM interrupt mode bit is set and 
enabled and a correctable memory error occurs while reading data from memory. 


MCU 


The MCU interrupt flag sets if the IMC interrupt mode bit is set and enabled and the 
MCU interrupt signal becomes active on I/O channel 40. 


RTI 


The real-time interrupt flag sets if the IRT interrupt mode bit is set and enabled and a 
real-time interrupt request is received. 


ICP 


The interprocessor interrupt flag sets if the IIP interrupt mode bit is set and enabled 
and another CPU requests an interrupt of this CPU by issuing instruction 0014/1 . 


101 


The I/O interrupt flag sets if the SIE bit is set and this CPU is the lowest-numbered 
CPU with HO interrupt mode set and enabled when a LOSP or VHISP channel 
completes a transfer. 


PCI 


The programmable clock interrupt flag sets if the IPC interrupt mode bit is set and 
enabled and the counter in the programmable clock equals 0. 


DL 


The deadlock interrupt flag sets if the IDL interrupt mode bit is set, the program is not 
in monitor mode, and a deadlock condition occurs because all CPUs in a cluster are 
holding issue on a test and set instruction. 


Mil 


The monitor instruction interrupt flag sets if the IMI interrupt mode bit is set and a 
monitor mode instruction (001 ijk; }* 0) issues while the program is not in monitor 
mode. 


NEX 


The normal exit flag sets if the FNX interrupt mode bit is set and enabled and a 
normal exit instruction (004000) issues. Issuing a normal exit instruction always 
causes an exchange, regardless of the state of FNX. 
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Status Field 



Modes Field 



The status field contains 4 bits used to indicate the state of the CPU at 
the time an exchange occurs. These status bits are set during program 
execution and are therefore not user selectable. Table 2-3 briefly 
describes each of the status bits used. 



Table 2-3. CRAY Y-MP C90 Status Field Bit Assignments 


Status 


Description 


VNU 


The vectors not used bit sets if no vector instructions {077ijk or 1 AOijk through 
~\77ijk) were issued during the execution interval. 


FPS 


The floating-point status bit sets if a floating-point error occurred during the execution 
interval. 


WS 


The waiting on semaphore bit sets if a test and set instruction (0034/&) is holding 
issue in the CIP register. 


PS 


The program state bit is set by the operating system to denote whether a CPU 
concurrently processing a program with another CPU is the master or slave in a 
multitasking situation. 



There are four user-selectable modes that allow the programmer to select 
several modes of operation for the program. These modes are described 
briefly in Table 2-4. 



Table 2-4. CRAY Y-MP C90 Operating Modes 


Mode 


Description 


C90 


If C90 mode is set, the program can use the full CRAY Y-MP C90 instruction set; 
otherwise, only CRAY Y-MP instructions can be executed. 


ESL 


If enable second vector logical mode is set, the second vector logical functional unit 
is enabled, and if it is not busy, it has first priority to execute instructions 1 AOijk 
through 1 A5ijk. 


BDM 


If bidirectional memory mode is set, block read and write operations can operate 
concurrently. 


MM 


If monitor mode is set, the program can execute those instructions that are privileged 
to monitor mode. 
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Processor Number Field 



The contents of the 4-bit processor number field indicate which CPU 
performed the exchange sequence. This value is not initially stored in 
the exchange package before the program starts; it is a constant value 
inserted into the exchange package after the program has run and been 
exchanged. 



Cluster Number Field 



The 5-bit cluster number (CLN) field contains the number to be loaded 
into the CLN register. This number selects one of 17io available clusters 
of shared registers that the CPU can access. If the contents of the CLN 
register are 0, the CPU does not have access to any shared registers. The 
contents of the CLN registers in all CPUs are also used to determine a 
deadlock interrupt condition. 



Exchange Address Register Field 



Vector Length Register Field 



A Register Fields 



S Register Fields 



The 8-bit exchange address (XA) register field specifies the address of 
the first word of a 16-word exchange package loaded by an exchange 
sequence. The XA register contains only the 8 high-order bits of a 12-bit 
absolute memory address. The low-order bits of the address are always 
0, because an exchange package must begin on a 16-word boundary. The 
12-bit limit on the absolute memory address means that all exchange 
packages are located in the lower 4,096 (10000s) words of memory. 



The 8-bit vector length (VL) register field specifies the length of all 
vector operations performed by vector instructions and the effective 
number of elements held in the V registers. The value in the VL register 
can be changed during program execution by using the 00200A: 
instruction. 



The current contents of all A registers are stored in bits 2 s2 through 2 63 
of words through 7 during an exchange sequence. 



The current contents of all S registers are stored in bits 2° through 2 63 of 
words 8 through 15 during an exchange sequence. 
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Instruction Fetch Sequence 



Instruction Issue 



Programmable Clock 



Status Registers 



Performance Monitor 



An instruction fetch sequence retrieves program code from memory and 
places it in an instruction buffer. The program code is held in the 
instruction buffer before being delivered to the instruction issue registers. 



An instruction issue sequence is the series of steps performed to move an 
instruction from an instruction buffer through the issue registers and into 
execution. 



Each CPU has a programmable clock that generates periodic interrupts at 
specific preset intervals. Available intervals range between 9 and 
2 32 -1 CPs. Intervals shorter than 100 \is are not practical because of the 
monitor overhead involved in processing the interrupt. These 
instructions are privileged to monitor mode. 



Each CPU contains eight status registers. Memory error and register 
parity error information is reported to these registers, as well as 
information on the status of several bits in the active exchange package. 



The performance monitor tracks groups of hardware-related events. 
These results can be used to indicate the relative performance of a 
program. The performance monitor contains thirty-two 48-bit 
performance counters. 

Performance events are monitored only when operating in non-monitor 
mode. Entering monitor mode disables the performance counters. 

Two types of instructions are used with the performance monitor: user 
instructions and maintenance instructions. The user instructions allow 
the user to select and read the performance monitor. The maintenance 
instructions test the logic of the performance monitor. 
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Parallel Processing Features 

The CRAY Y-MP C90 mainframe has several special features that 
enhance the parallel processing capabilities inherent in the system. 
Parallel processing can mean different things in different environments; 
the following subsections discuss two types of parallel processing used: 



Parallel processing within a single CPU 
Parallel processing between two or more CPUs 



Parallel processing features within a single CPU include instruction 
pipelining and segmentation, functional unit independence, vector 
processing (described earlier in this section), multitasking, and 
Autotasking. The first two features are inherent hardware features of the 
CRAY Y-MP C90 mainframe; a programmer has little control over these 
features. The vector processing feature can be manipulated by the 
programmer to provide optimum throughput. Refer to the "Vector 
Processing" subsection later in this subsection for more information on 
vector processing. 

Parallel processing between two or more CPUs is called multiprocessing: 
the capability for several programs to run concurrently on multiple CPUs 
of a single mainframe. Included in this category are multitasking and the 
Autotasking feature of the CF77 Fortran compiling system. Multitasking 
is the capability to run two or more parts (or tasks) of a single program in 
parallel on different CPUs within a mainframe. Autotasking is automatic 
multiprocessing; it enables user programs to be automatically partitioned 
over multiple CPUs. 



Pipelining and Segmentation 



Pipelining means operation or instruction begins before a previous 
operation or instruction finishes. Pipelining is accomplished using fully 
segmented hardware. Segmentation means an operation is divided into a 
discrete number of sequential steps, or segments. Fully segmented 
hardware is designed to perform one segment of an operation during a 
single CP. At the beginning of the second CP, the partial results are sent 
to the second hardware segment in order to process the second step of the 
operation. During this second CP, the first hardware segment begins to 
perform the first step of the next operation. 

In the CRAY Y-MP C90 mainframe, all hardware is fully segmented. 
Therefore, pipelining occurs during all hardware operations such as 
exchange sequences, memory references, instruction fetch sequences, 
instruction issue sequences, and functional unit operations. The 
pipelining and segmentation features are critical to the execution of 
vector instructions. 
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Figure 2-11 shows the pipelining of three sets of scalar instructions 
through a segmented functional unit. 



Functional Unit 



Instruction Sequence 
S1 S1 + S3 
S3 S3 + S4 
S5 S5 + S6 



Scalar- 
Operands- 



' Scalar Results 



Segments 



CP 


Functional Unit Segment 
1 2 3 


1 


S1 Partial 
Result 




2 


S3 Partial 
Result 


S1 Partial 
Result 


3 


S5 Partial 
Result 


S3 Partial S1 Partial 
Result Result 


4 




S5 Partial S3 Partial 
Result Result 



Figure 2-11. Scalar Segmentation and Pipelining Example 



In the first CP, the first set of operands enters the first segment of the 
functional unit. During the next CP, the partial result is moved to the 
second segment of the functional unit, and the second pair of operands 
enters the first segment. This process continues each CP until the three 
operand pairs are completely processed. After 3 CPs, the first result 
leaves the functional unit and enters scalar register SI; the S3 and S5 
results will be available in successive CPs. 

The CRAY Y-MP C90 mainframe contains two sets of vector functional 
units: one for processing even-numbered elements and one for 
processing odd-numbered elements. This enables two pairs of elements 
to be processed in a single CP and almost doubles the vector processing 
rate. Figure 2-12 shows how a set of vector elements is pipelined 
through a dual vector functional unit. In the first CP, element of 
register VI and element of register V2 enter the first segment of the 
pipe functional unit, while element 1 of each register enters the pipe 1 
functional unit. During the next CP, the partial results move to the 
second segments of each functional unit, while element 2 of both vector 
registers enters the first segment of the pipe functional unit, and 
element 3 of both vector registers enters the first segment of the pipe 1 
functional unit. This process continues each CP until all elements are 
completely processed. 
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In this example, the functional units are divided into five segments; the 
dual functional units process up to ten different pairs of elements 
simultaneously. After 5 CPs, the first results leave the functional units 
and enter vector register V3; subsequent results are available at the rate 
of two results per CR 



Vector Register V1 
(Even Elements) 



Element 



Element 2 



Element N 



Vector Register V2 
(Even Elements) 



Element 



Element 2 



Element N 



Vector Register V1 
(Odd Elements) 



Element 1 



Element 3 



Element N 



Vector Register V2 
(Odd Elements) 



Element 1 



Element 3 



Element N 



PipeO 
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Segments 



Pipel 
Functional Unit 



12 3 4 5 

V v ' 

Segments 
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CP 


1 


Functional Unit Segment 
2 3 4 5 


1 


0,1 




2 


2,3 


0,1 


3 


4,5 


2, 3 0, 1 


4 


6,7 


4, 5 2, 3 0, 1 


5 


8,9 


6,7 4,5 2,3 0,1 


6 


10,11 


8,9 6,7 4,5 2,3 



Elements in Each 
► Segment during 
Successive CPs 
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Figure 2-12. Vector Segmentation and Pipelining Example 
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Functional Unit Independence 

The specialized functional units in the CRAY Y-MP C90 mainframe 
handle the arithmetic, logical, and shift operations. Most functional units 
are fully independent; any number of functional units can process 
instructions concurrently. Functional unit independence allows different 
operations such as multiplications and additions to proceed in parallel. 

For example, the operation represented by the equation 
A=(B + C)xDxE could be accomplished as follows. If operands B, C, 
D, and E are loaded into the S registers, three instructions are generated 
for the equation: one that adds B and C, one that multiplies D and E, and 
one that multiplies the results of these two operations. The 
multiplication of D and E is issued first, followed by the addition of B 
and C. The addition and multiplication operations proceed concurrently. 
Because the addition takes less time to run than the multiplication, both 
operations finish at the same time. The addition operation does not 
require additional processing time because it occurs during the same time 
interval as the multiplication operation. The results of these two 
operations are then multiplied to obtain the final result. 



Vector Processing 



Vector processing increases processing speed and efficiency by allowing 
an operation to be performed sequentially on a set (or vector) of 
operands through the execution of a single instruction. 

A vector is an ordered set of elements; each element is represented as a 
64-bit word. A vector is distinguished from a scalar, which is a single 
64-bit word. Examples of structures in Fortran that can be represented as 
vectors are one-dimensional arrays and rows, columns, and diagonals of 
multidimensional arrays. Vector processing occurs when arithmetic or 
logical operations are applied to vectors; it is distinguished from scalar 
processing in that it operates on many elements rather than on one. 

In vector processing, two successive pairs of elements are processed each 
CP. The dual vector pipes and the dual sets of vector functional units 
allow a pair of even-numbered elements and a pair of odd-numbered 
elements to be processed during the same CP. As each pair of operations 
is completed, the results are delivered to successive even- or 
odd-numbered elements of the result register. The vector operation 
continues until the number of elements processed by the instruction 
equals the count specified by the vector length (VL) register. 
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Parallel vector operations allow the generation of more than two results 
per CP. Parallel vector operations occur automatically in the following 
situations: 

• When successive vector instructions use different functional units 
and different V registers. 

• When successive vector instructions use the result stream from one 
vector register as the operand of another operation using a different 
functional unit. This process is known as chaining and is explained 
later in this subsection. 



Advantages of Vector Processing 



V Register Functions 



In general, vector processing is faster and more efficient than scalar 
processing. Vector processing reduces the overhead associated with 
maintenance of the loop-control variable (for example, incrementing and 
checking the count). In many cases, loops processed as vectors are 
reduced to a simple sequence of instructions without branching 
backwards. Central memory access conflicts are reduced, and finally, 
functional unit segmentation is exploited through vector processing 
because results from the units can then be obtained at the rate of two 
results per CP. 

Vectorization typically speeds up a code segment by an approximate 
factor of ten. If a segment of code that previously accounted for 50% of 
a program's running time is vectorized, the overall running time is 55% 
of the original running time (50% for the unvectorized portion plus 
0.1 x 50% for the vectorized portion). Vectorizing 90% of a program 
causes running time to drop to 19% of the original execution time. 



The V registers are used solely for vector processing. This is unlike the 
A and S registers, which are used for many secondary functions. Vector 
processing allows a single instruction to perform a specified operation 
sequentially on a set (vector) of operands, to produce a vector of results. 
Examples of these sets or vectors may be rows or columns of a matrix or 
elements of a table. 

The contents of a V register are transferred to or from central memory by 
means of a block transfer. A vector block transfer is accomplished by 
specifying a first word address in central memory, an increment or 
decrement value for the central memory address, and a vector length. 
The transfer begins with the first element of the V register and proceeds 
at a maximum rate of two words per clock period (CP); this rate can be 
affected by central memory conflicts. A central memory conflict 
interrupts the vector data stream and can occur in chained operations 
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Vector Instructions 



Vector Chaining 



(although they do not inhibit chaining). Any interruption in the vector 
data stream adds proportionally to the total execution time of vector 
operations. 

Single-word data transfers can also be made between an S register and an 
element of a V register. 



Vector instructions reference V registers by specifying the register 
number in the i, j, or k field of the instruction. Refer to the "Instruction 
Formats" subsection later in this section for information on instruction 
fields. Operations on vector registers always start with element 0. 
Individual elements of a V register are designated by octal numbers 
ranging from 00 through 177. These numbers appear as subscripts to 
vector register references. For example, V627 refers to element 27 of 
V register 6. 

Vector instructions reserve V registers as either operands or results. If 
the register is reserved as an operand, it cannot be used as an operand or 
result until the operand reservation clears. A vector register can be used 
as both an operand and result register for the same vector instruction. If 
a register is reserved as a result, it can be used as an operand through a 
process called chaining. Refer to the subsection "Vector Chaining" in 
this section for more information on chaining. 

No reservation is placed on the VL register during vector processing. If 
a vector instruction uses an S register as an operand, no reservation is 
placed on the S register. Conflicts can occur between vector and scalar 
operations involving floating-point operations and memory access. With 
the exception of these operations, the floating-point functional units are 
always available for scalar operations. The S and VL registers can be 
modified after the vector instruction issues without affecting the vector 
operation. The A0 and AA; registers in a vector memory reference can 
also be modified after the instruction issues. 

Because most transfers to or from registers are done in blocks of data, 
instructions that transfer data between V registers and central memory 
reserve a port, and functional unit instructions reserve the appropriate 
functional unit. 



The CRAY Y-MP C90 mainframe allows a vector register reserved for 
results to become the operand register of a succeeding instruction. This 
process, called chaining, allows a continuous stream of operands to flow 
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through the vector registers and functional units. Even when a vector 
load operation pauses because of memory conflicts, chained operations 
may proceed as soon as data is available. 

This chaining mechanism allows chaining to begin at any point in the 
result vector data stream. The amount of concurrency in a chained 
operation depends on the relation between the issue time of the chaining 
instruction and the arrival time of the result data stream. For full 
chaining to occur, the chaining instruction must issue and be ready to use 
element of the result at the same time element arrives at the V 
register. Partial chaining occurs if the chaining instruction issues after 
the arrival of element of the result vector data stream. 

Figure 2-13 shows how the results of four instructions are chained 
together. The instruction chaining sequence performs the following 
operations: 

1. Reads a vector of integers from central memory to register VO. 

2. Adds the contents of register VO to the contents of register VI and 
sends the results to register V2. 

3. Shifts the results obtained in Step 2 and sends the results to 
register V3. 

4. Forms the logical product of the shifted sum obtained in Step 3 
with the contents of register V4 and sends the results to register V5. 




V1 Register 



VO Register {- 



V4 Register 



V2 Register V3 Register [ 



V5 Register 




Memory Path 



Vector Add 
Functional Unit 



Vector Shift 
Functional Unit 



Vector Logical 
Functional Unit 
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Figure 2-13. Vector Chaining Example 
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As soon as the first two elements from central memory arrive at register 
VO, they are added to the first two elements of vector register VI. 
Subsequent pairs of elements are pipelined through the segmented 
functional unit, so that a continuous stream of results is sent to the 
destination register, which is register V2. As soon as the first two 
elements arrive at register V2, they are used as operands for the shift 
operation. The results are sent to register V3, which immediately 
becomes the source of one of the operands necessary for the logical 
operation between registers V3 and V4. The results of the logical 
operation are then sent to register V5. 



Multiprocessing and Multitasking 



Users of CRAY Y-MP C90 mainframes can take advantage of parallel 
processing features known as multiprocessing and multitasking; this 
category also includes microtasking. 

Parallel processing between two or more CPUs is called multiprocessing: 
the capability for several programs to be run concurrently on multiple 
CPUs of a single mainframe. Up to n programs can run simultaneously 
on a machine with n CPUs. 

Multitasking is a more recent and complex enhancement than 
vectorization. Multitasking is the capability to run two or more parts, or 
tasks, of a single program in parallel on different CPUs within a 
mainframe. To take advantage of this feature, a program must be 
logically or functionally divided to allow two or more tasks to run 
simultaneously (that is, in parallel). For example, a weather modeling 
application in which the northern hemisphere calculation is one section 
of code and the southern hemisphere is another section of code. Distinct 
code segments are not needed; the same code could run on multiple 
processors simultaneously, with each processor handling separate data. 
Theoretically, the gain from multitasking can be calculated in the 
following manner. A program running on a dedicated system in wall 
clock time (t) could run in a time as short as tin if multitasked, or 
modified to use n or more parallel tasks on a machine with n CPUs. 

Actually, a speed-up factor of n is not quite attainable because of the 
additional processing operations (overhead) needed to implement 
multitasking. In some instances, multitasking can actually increase a 
program's execution time if the multitasking overhead decreases 
performance more than parallel processing time improves it. This is a 
situation that must be investigated before investing too much time and 
effort. 



HR-04028 2-41 



Mainframe CRAY Y-MP C90 Functional Description Manual 



The following list includes some factors that limit the maximum 
improvement potential of a program: 

• When not all parts of a program can be divided into parallel tasks. 

• When those parts that can be multitasked may have dependencies 
on one another that result in one or more tasks having to wait for 
other tasks to finish. 

• When the multitasking feature incurs additional processing time 
that is added to the program. 

The CFT compiler on the CRAY Y-MP C90 mainframe automatically 
uses the vector hardware to perform operations on inner DO loops that 
have no data dependencies. Once such optimizing is complete, a single 
processor can work no faster, but more than one processor could operate 
on separate parts of the data simultaneously to achieve results faster. 
Microtasking permits multiple processors to work on a Fortran program 
at the DO-loop level. The name microtasking was chosen because 
multiprocessing is efficient even at a DO-loop level where the task size, 
or granularity, may be small. 

Microtasking also works well when the number of processors available is 
unknown or may vary during program execution. This means that 
microtasked jobs do not require a dedicated system, although they 
perform best in a dedicated environment with no competing jobs. 

As stated before, advanced programming skills and tools are needed to 
use multiprocessing, multitasking, and microtasking concepts efficiently. 



Autotasking 



CRAY Y-MP C90 mainframe analysts and programmers can use 
Autotasking (automatic multitasking) to automatically detect whether 
portions of their programs can be run in parallel. Autotasking is an 
extension of multiprocessing and microtasking and is designed to make 
parallel processing easier to use. Autotasking alters a Fortran program to 
allow it to run simultaneously on multiple CPUs. 

Autotasking is available on CRAY Y-MP computer systems beginning 
with UNICOS release 4.0 and CF77 release 3.0. Refer to CF77 
Compiling System, Volume 4: Parallel Processing Guide, CRI 
publication number SG-3074, for more detailed information on 
Autotasking. 
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CPU Instructions 



The following subsections explain the instruction formats, instruction 
differences between the Y-MP mode and C90 mode, special register 
values, special Cray Assembly Language (CAL) syntax forms, and 
monitor mode instructions used by the CRAY Y-MP C90 computer 
system. A CPU instruction summary is also included. 



Notational Conventions 



The following conventions are used throughout this section: 

All numbers are decimal numbers unless otherwise indicated. 

Letters X or x or x represent an unused value. 

Register bits are numbered from right to left as powers of 2. 

The letter n represents a specified value. 

The notation (value) specifies the contents of a register or memory 
location as designated by value. 

Variable parameters are in italic type. 

The vector mask bits are contained in the VM and VM1 registers. 
The bits of the VM register correspond to vector elements 
through 63, and the bits of the VM1 register correspond to vector 
elements 64 through 127, as shown in Figure 2-14. 
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Figure 2-14. Vector Mask Bits 



Instructions can be 1 parcel (16 bits), 2 parcels (32 bits), or 3 parcels (48 
bits) long. Instructions are packed 4 parcels per word, and parcels are 
numbered through 3 from left to right. Any parcel position can be 
addressed by branch instructions. A 2- or 3-parcel instruction can begin 
in any parcel of a word and can span a word boundary. For example, a 
2-parcel instruction beginning in parcel 3 of word 1 ends in parcel of 
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word 2. Parcels 0, 1, and 2 of word 1 do not need to be filled with all 
zeros or ones (padded). Figure 2-15 shows the general instruction 
format. 





First Parcel 








Second Parcel 








Third Parcel 




( 

g 


h i j 


k 


f 


m 


\ 


t 


\ 

n 


Fields 


4 


3 


3 


3 


3 




16 




16 


Number of Bits 

A-10388 



Figure 2-15. General Instruction Format 

The first parcel is divided into five fields, and the second and third 
parcels each contain a single field. The four variations of this general 
format are listed below. 

• 1 -parcel instruction format with discrete j and k fields 

• 1 -parcel instruction format with combined; and k fields 

• 2-parcel instruction format with combined i, j, k, and m fields 
(Y-MP mode only) 

• 3-parcel instruction format with combined m and n fields 

Each format uses the fields differently and is described in detail in the 
following subsections. 



Instruction Formats 



The following subsections explain the instruction formats, as well as the 
instruction differences between Y-MP mode and C90 mode. 



1 -parcel Instruction Format with Discrete/ and k Fields 



The most common of the 1 -parcel instruction formats uses the i, j, and k 
fields as individual designators for operand and result registers (refer to 
Figure 2-16). The g and h fields define the operation code, the i field 
designates a result register, and the) and k fields designate operand 
registers. Some instructions ignore one or more of the i, j, and k fields. 
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Figure 2-16. 1 -parcel Instruction Format with Discrete,/ and k Fields 



The following types of instructions use this format: 



Arithmetic 
Logical 
Vector shift 
Scalar double shift 
Floating-point constant 



1 -parcel Instruction Format with Combined J and k Fields 



Some 1 -parcel instructions use the j and k fields as a combined 6-bit field 
(refer to Figure 2-17). The g and h fields contain the operation code, and 
the i field usually designates a result register. The combined; and k 
fields contain a constant or an intermediate address (B) or intermediate 
scalar (T) register designator. The 005yA: branch instruction and the 
following types of instructions use the 1 -parcel instruction format with 
combined; and k fields: 



6-bit constant 

B or T register block memory transfer 

B or T register data transfer with address (A) or scalar (S) register 

Scalar single shift 

Scalar mask 



g h i jk Fields 



4 


3 


3 


6 


Number of Bits 


Operation 
Code 


t 

Constant 

Registe 

Designate 


or 

r 

3rs 




F 
R 


Res 
eg 


>ul1 
ste 


>r 


A-10390 



Figure 2-17. 1 -parcel Instruction Format with Combined j and k Fields 
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2-parcel Instruction Format with Combined /, /, k, and m Fields 



The 2-parcel instruction format uses the combined i, j, k, and m fields to 
contain a 24-bit address that allows branching to an instruction parcel 
(refer to Figure 2-18). A 7-bit operation code (gh) is followed by an ijkm 
field. The high-order bit of the i field (i2) is equal to 0. 
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Figure 2-18. 2-parcel Instruction Format with Combined i,j, k, and m Fields 



3-parcel Instruction Format with Combined m and n Fields 



There are three distinct 3-parcel instruction formats using the combined 
m and n fields. 

The format for a 32-bit immediate constant uses the combined m and n 
fields to hold the constant. The 7-bit g and h fields contain an operation 
code, and the 3-bit i field designates a result register. The instructions 
using this format transfer the 32-bit mn constant to an A or S register. 

NOTE: The m field of the 3-parcel instruction contains bits 2° through 
2 15 of the expression, while the n field contains bits 2 16 through 
2 31 of the expression. When the instruction is assembled, the 
mn field is reversed and actually appears as the nm field when 
used as an expression. 

The format for a C90 mode branch instruction uses the combined m and 
n fields to hold the memory branch address. The C90 mode is explained 
in the next subsection. The 7-bit g and h fields (and, in one case, bit 2 2 
of the i field) contain an operation code. 

The format for A or S register memory references uses the combined m 
and n fields to hold the memory reference address. This format uses the 
4-bit g field for an operation code, the 3-bit h field to designate an 
address index register, and the 3-bit i field to designate a source or result 
register. 
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Figure 2-19 shows the three applications for the 3-parcel instruction 
format with combined m and n fields. Remember that the m and n fields 
are reversed when a 3-parcel instruction is assembled. 
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Figure 2-19. 3-parcel Instruction Format with Combined m and n Fields 



Instruction Differences between Y-MP Mode and C90 Mode 

The CRAY Y-MP C90 mainframe offers two modes of operation: a Y-MP 
compatibility mode and C90 mode. These two modes are referred to as 
as Y-MP mode and C90 mode, respectively. In Y-MP mode, all 
instructions defined within the Y-MP mode of the CRAY Y-MP 
computer system will function as specified for that mode. In addition, 
many new C90 mode instructions are supported in the Y-MP mode. 
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Table 2-5 lists the new instructions for the CRAY Y-MP C90 computer 
system or those that function differently in the CRAY Y-MP C90 and 
CRAY Y-MP computer systems. 



Table 2-5. Instruction Comparisons 


Machine 
Instruction 


CAL Syntax 


CRAY Y-MP C90 
Function Description 


CRAY Y-MP 
Function Description 


001000 


PASS 


This is a no-operation 
instruction. 


This is a no-operation 
instruction. 


001 2/2 


DI,A/ 


Disable channel (A/) interrupts. 


Disable channel (A/) 
interrupts. 


001 2;3 


EI.A; 


Enable channel (A/) interrupts. 


Enable channel (A/) 
interrupts. 


001302 


EMI 


Enable monitor mode interrupt 
modes. 


Enable monitor mode 
interrupt modes. 


001303 


DMI 


Disable monitor mode interrupt 
modes. 


Disable monitor mode 
interrupt modes. 


001406 


ECI 


Enable the programmable clock 
interrupt request. 


Enable the programmable 
clock interrupt request. 


001407 


DCI 


Disable the programmable 
clock interrupt request. 


Disable the programmable 
clock interrupt request. 


001600 


ESI 


Enable system I/O interrupts. 


Enable system I/O interrupts. 


001 Ijk 


BP.Jt Ay 


Transmit (A/) to breakpoint 
address k. 


N/A 


00200& 


VL Ak 


Transmit (Ak) to VL register, 
(maximum VL= 128) 


Transmit (AA;) to VL register, 
(maximum VL = 64) 


002301 


EBP 


Enable interrupt on breakpoint. 


N/A 


002401 


DBP 


Disable interrupt on breakpoint. 


N/A 


002704 


CPA 


Complete port reads and writes. 


N/A 


002705 


CPR 


Complete port reads. 


N/A 


002706 


CPW 


Complete port writes. 


N/A 


0030;1 1 


VM1 Sj 


Transmit {Sj) to VM upper 
register. 


N/A 



1 This instruction is supported only in C90 mode. 
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Table 2-5. Instruction Comparisons (continued) 


Machine 
Instruction 


CAL Syntax 


CRAY Y-MP C90 
Function Description 


CRAY Y-MP 
Function Description 


0034/* 


SM.AJfc 1,TS 


Test and set semaphore [Ak). 
02 = 1) 


N/A 


0036/Jfc 


SM.AJfc 


Clear semaphore (Ait). (/2 = 1 ) 


N/A 


0037/* 


SM,A£ 1 


Set semaphore (A£). (/2 = 1) 


N/A 


0051./* 


Jinv Bjk 


Jump to (B;*). (maintenance 
only, invalidate IB) 


N/A 


006000 «m 1 


J exp 


Jump to nm. 


N/A 


0064/1 nm 1 


JTS/1 exp 


Branch to nm if SMjk = 1 ; else 
set SMjk = 1 . 02 = 0) 


N/A 


0064/1 nm 1 


JTS.Afc exp 


Branch to nm if SM.Afc = 1 ; else 
setSM,A£=1. 02 = 1) 


N/A 


007000 nm 1 


R exp 


Return jump to nm; set BOO to 
(P)+3. 


N/A 


01 0000 nm 1 


JAZ exp 


Jump to nm if (A0) = 0. 


N/A 


011 000 nm 1 


JAN exp 


Jump to nm if (A0) * 0. 


N/A 


01 2000 nm 1 


JAP exp 


Jump to nm if (A0) & 0. 


N/A 


01 3000 nm 1 


JAM exp 


Jump to nm if (A0) < 0. 


N/A 


01 4000 n/w 1 


JSZ exp 


Jump to nm if (SO) = 0. 


N/A 


01 5000 nm 1 


JSN exp 


Jump to nm if (SO) * 0. 


N/A 


01 6000 nm 1 


JSP exp 


Jump to nm if (SO) a 0. 


N/A 


01 7000 nm 1 


JSM exp 


Jump to nm if (SO) < 0. 


N/A 


023/01 


Ai VL 


Transmit (VL) to Ai. (maximum 
VL = 128) 


Transmit (VL) to Ai. 
(maximum VL = 64) 


026//4 


Ai SB.A/,+1 


Transmit (SB) designated by 
(A/) to Ai; increment by 1 . 


N/A 


026i/5 


A/ SB/,+1 


Transmit (SB;) to Ai; increment 
by1. 


N/A 


026i/6 


Ai SB, A/ 


Transmit (SB) designated by 
(A/) to Ai. 


N/A 



1 These instructions are supported only in C90 mode. 
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Table 2-5. Instruction Comparisons (continued) 


Machine 
Instruction 


CAL Syntax 


CRAYY-MPC90 
Function Description 


CRAY Y-MP 
Function Description 


027//6 


SB,A/ A/ 


Transmit (Ai) to SB designated 
by (A/). 


N/A 


033*71 


A/ CE,A; 


Transmit error flag of channel 
(A/) to Ai (/ * 0); include done 
flag. 


Transmit error flag of channel 
(A/) to A/. (/*0) 


040/20 nm 


Si Si.exp 


Transmit nm to Si bits 2°-2 31 . 
(bits 2 32 -2 63 are unchanged) 


N/A 


040/40 nm 


Si exp.Si 


Transmit nm to Si bits 2 32 -2 63 . 
(bits 2°-2 31 are unchanged) 


N/A 


072//6 


Si ST.A/ 


Transmit (ST) designated by 
(A/) to Si. 


N/A 


073/1 1 


Si VM1 


Transmit VM1 to Si. 


N/A 


073/21 


Si SR2 


Read PM counters 00-17 and 
increment pointer. 


Increment performance 
counter. 


073/25 


SR2 Si 


Issue PM maintenance 
advance. 


N/A 


073/31 


Si SR3 


Read PM counters 20-37 and 
increment pointer. 


Clear all maintenance modes. 


073/75 


SR7 S/ 


Transmit (Si) to maintenance 
mode register. 


N/A 


073//1 


S/ SR; 


Transmit (SR» to Si. 


N/A 


073//5 


SR; Si 


Transmit (Si) to SRj. 


N/A 


073//6 


ST.A/ Si 


Transmit (Si) to ST designated 
by (A/). 


N/A 


005400 
150//0 


Vi y/<vo 


Shift (V/) left (V0) places to Vi. 


N/A 


005400 
151//0 


Vi V/>V0 


Shift (V/) right (V0) places to Vi. 


N/A 


005400 
152//* 


Vi V;\A* 


Transfer (V/) to (V7) starting at 
element (Ait). 


N/A 


174//3 


Vi ZV; 


Transmit leading zero count of 
(V/)toV/. 


N/A 
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Special Register Values 



If the SO and AO registers are referenced in the h, j, or k fields of certain 
instructions, the contents of the respective register are not used; instead, 
a special operand is generated. This special operand is available 
regardless of existing AO or SO reservations, and in this case the 
reservation on the register is not checked. This special operand does not 
alter the actual value of the SO or AO register. If register SO or AO is 
referenced in the i field as an operand, the value stored in the register is 
used. The CAL assembler issues a caution-level error message for AO or 
SO when does not apply to the i field. Table 2-6 lists the special 
register values. 



Table 2-6. Special Register Values 


Field 


Operand Value 


Ah,h = Q 





Am = 





M, A:=0 


1 


Sj,j = 





Sk, k = 


2 63 = 1 



Special CAL Syntax Forms 



Certain machine instructions can be generated from two or more 
different CAL instructions. Any of the operations performed by special 
instructions can be performed by instructions in the basic CAL 
instruction set. 

For example, the following CAL instructions generate machine 
instruction 002000, which enters a 1 into the vector length (VL) register: 

VL A0 
VL 1 

The first instruction is the basic form of the enter VL instruction, which 
takes advantage of the special case where (Ak)=l if k=0. The second 
instruction is a special syntax form providing the programmer with a 
more convenient notation for the special case. 
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Monitor Mode Instructions 



The monitor mode instructions (channel control, set real-time clock, 
programmable clock interrupts, and so on) perform specialized functions 
that are useful to the operating system. These instructions run only when 
the CPU is operating in monitor mode. If a monitor mode instruction 
issues while the CPU is not in monitor mode, it is treated as a 
no-operation instruction. 

In several cases, a single CAL instruction can generate several different 
machine instructions. These cases provide for entering the value of an 
expression into an A register or an S register, or for shifting S register 
contents. The assembler determines which instruction to generate from 
characteristics of the expression. 



Program Range 



The program range, or maximum program length, is 4 Mwords in Y-MP 
mode and 1 Gword in C90 mode. An instruction outside these ranges 
produces an undefined result. 



CPU Instruction Summary 



This subsection introduces and summarizes all mainframe instructions 
used by the CRAY Y-MP C90 computer system. The instructions are 
summarized two ways: by the functional unit that executes the 
instruction and by the function the instruction performs. 

The following instruction summaries use the acronyms and abbreviations 
that were defined in previous sections. A glossary is provided at the end 
of this manual; acronyms and abbreviations are defined there. 



In some instructions, register designators are prefixed by the following 
letters that have special meaning to the assembler. The letters and their 
meanings are listed as follows: 

Letter Description 



F 


Floating-point operation 


H 


Half-precision floating-point operation 


I 


Reciprocal iteration 


P 


Population count 


Q 


Parity count 


R 


Rounded floating-point operation 


Z 


Leading-zero count 
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Character Operation 



/ 
# 
> 
< 

& 
! 

\ 



Arithmetic sum of specified registers 
Arithmetic difference of specified registers 
Arithmetic product of specified registers 
Reciprocal approximation 
Use one's complement 
Shift value or form mask from left to right 
Shift value or form mask from right to left 
Logical product of specified registers 
Logical sum of specified registers 
Logical exclusive OR of specified registers 



An expression (exp) occupies the jk, ijkm, or mn field. The h, i, j, and k 
designators indicate the field of the machine instruction into which the 
register designator constant or symbol value is placed. 



Functional Units Instruction Summary 



Instructions other than simple transmit or control operations are 
performed by specialized hardware components known as functional 
units. Listed below are the machine instructions performed by each of 
the functional units. 



Functional Unit 



Instructions 



Address add (integer) 
Address multiply (integer) 
Scalar add (integer) 
Scalar logical 
Scalar shift 

Scalar pop/parity/leading zero 
Vector add (integer) 
Vector logical 
Second vector logical 
Vector shift 
Vector pop/parity 
Floating-point add 
Floating-point multiply 
Floating-point reciprocal 
Memory (scalar) 
Memory (vector) 



030, 031 

032 

060, 061 

042 through 051 

052 through 057 

026, 027 

154 through 157 

140 through 147, 175 

140 through 145 

150 through 153 

174yl, 174J/2 

062, 063, 170 through 173 

064 through 067, 160 through 167 

070, 174//0 

10/* through 13/* 

176, 177 
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Functional Instruction Summary 



This subsection summarizes the instructions by the functions they 
perform. Included is a brief, general description of the function of each 
group of instructions; then the machine instruction, the CAL syntax, and 
a description is listed. 

NOTE: The following superscripts in the machine instruction column 
are used throughout the instruction summary: 

Superscript Description 



C90 mode only 
Y-MP mode only 
Privileged to monitor mode 



Register Entry Instructions 



The register entry instructions transmit values, such as constants, 
expression values, or masks, directly into registers. 



Transfers into A Registers 



The following instructions transmit values into the A registers. 

Description 



Machine 
Instruction CAL Syntax 



020*00 nm 


Ai exp 


021*00 nm 


Ai exp 


022ijk 


Ai exp 


031i00 


Ai -1 



Transmit nm to Ai 

Transmit one's complement of 
nm to Ai 

Transmit^ to Ai 

Transmit -1 to Ai; (Ai = 77777777) 



Transfers into S Registers 



The following instructions transmit values into the S registers. 



Machine 
Instruction CAL Syntax 

040i00 nm Si exp 



Description 

Transmit nm to Si - 31 
(32 - 63 = 0) 
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Machine 

Instruction CAL Syntax 

040/20 nm Si Si.exp 



040/40 nm Si exp:Si 
041/00 nm Si exp 
042/00 Si -1 



042ijk 



042ijk 



Si <exp 



Si #>exp 



047/00 Si #SB 

051/00 Si SB 

071/30 Si 0.6 

071/40 S/ 0.4 

071/50 Si 1.0 



Description 

Transmit nm to S/ 00-31 
(32 - 63 unchanged) 

Transmit nm to Si 32 - 63 
(00 - 31 unchanged) 

Transmit one's complement of nm 
to Si (32 - 63 = 1) 

Enter -1 into Si; (Si = 177777 
177777 177777 177777) 

Form ones mask exp bits in Si from 
the right; jk field contains 100s -exp 

Form zeros mask exp bits in S/ from 
the left; jk field get exp 



042/77 


Si 1 


Enter 1 into Si 


043/00 


Si 


Clear Si 


043//A: 


S > exp 


Form ones mask exp bits in Si from 
the left; jk field gets exp 


043//* 


Si #<exp 


Form zeros mask exp bits in Si from 
the right; jk field contains 
1008 -exp 



Enter one's complement of sign bit 
into Si 

Enter sign bit into Si 

Transmit constant 0.75*2**48 to Si 
(Si = 040060 140000 000000 
000000) 

Transmit constant 0.4 to Si 
(Si = 040000 100000 000000 
000000) 

Transmit constant 1.0 to Si 
(Si = 040001 100000 000000 
000000) 
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Machine 

Instruction CAL Syntax 



071*60 



Si 2.0 



071/70 Si 4.0 



Description 

Transmit constant 2.0 to Si 
(Si = 040002 100000 000000 
000000) 

Transmit constant 4.0 to Si 
(Si = 040003 100000 000000 
000000) 



Transfers into V Registers 



The following instructions transmit values into the V registers. 



Machine 






Instruction 


CAL Syntax 


Description 


077*0* 


Vi,A* 


Clear Vi element (A*) 


145m 


Vi 


Clear Vi 



Transfers into Semaphore Registers 

The following instructions transmit values into the semaphore registers. 



Machine 






Instruction 


CAL Syntax 


Description 


0034/* 


SM;* 1,TS 


Test and set semaphore;'* (/2 = 0) 


0034;* 


SM,A* 1,TS 


Test and set semaphore (A*) (/2 = 1) 


00367* 


SM;* 


Clear semaphore;* (/2 = 0) 


0036/* 


SM,A* 


Clear semaphore (A*) (/2 = 1) 


0037;* 


SM;* 1 


Set semaphore ;* (/2 = 0) 


0037;* 


SM,A* 1 


Set semaphore (A*) 02=1) 



Interregister Transfer Instructions 



The interregister transfer instructions transmit the contents of one 
register to another register. In some cases, the register contents can be 
complemented, converted to floating-point format, or sign extended as a 
function of the transfer. 
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Transfers to A Registers 



The following instructions transfer the contents of other registers into the 
A registers. 



Machine 

Instruction CAL Syntax 



023//0 
023*01 
024ijk 
026ij4 

026ij5 

026//6 

026//7 
030/0* 
031/0* 
033/00 

033//0 

033//1 



A/ Sj 
A/ VL 
A/ Bjk 
A/ SB,A/,+l 

A/ SB/,+1 

A/ SB/,A/ 

A/ SB; 
A/ AA: 
A/ -A* 
A/ CI 

A/ CA,A/ 

A/ CE,A/ 



Description 

Transmit (Sj) to A/ 

Transmit VL to A/ [VL = 128 (64)] 

Transmit (Bjk) to A/ 

Transmit (SB) designated by (A/) 
to A/, and increment (SB,A/) by 1 

Transmit (SB/) to A/, and increment 
(SB;)byl 

Transmit (SB) designated by (A/) 
to A/ 

Transmit shared B/ to A/ 

Transmit (A*) to A/ 

Transmit the negative of (AA) to A/ 

Transmit I/O interrupting channel 
number to A/ 

Transmit I/O-current address of 
channel (A/) to A/ 

Transmit channel status word 
(A/) to A/ 



Transfers to S Registers 



The following instructions transmit the contents of other registers into 
the S registers. 



Machine 

Instruction CAL Syntax 



047/OA 



Si #S£ 



Description 

Transmit one's complement of 
(S£) to S/ 
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Machine 

Instruction CAL Syntax 

051iOA: Si SA: 

O6I1OA: Si -Sk 

OlliOk Si Ak 



Ollilk 



Si +Ak 



Description 

Transmit (Sk) to Si 

Transmit negative of (SA:) to Si 

Transmit (AA:) to Si with no sign 
extension 

Transmit (AA:) to Si with sign 
extension 



Transfers to V Registers 



071i2k 


Si *Fak 


Transmit (AA:) to Si as unnormalized 
floating-point number (exponent 
equals 40060) 


072/00 


Si RT 


Transmit (RTC) to Si 


072/02 


Si SM 


Transmit (SM) to Si 


072(/3 


Si ST/ 


Transmit (ST/) to Si 


073/00 


Si VM 


Transmit vector mask lower to Si 


074ijk 


SiTjk 


Transmit (Tjk) to Si 


076ijk 


Si Vj,Ak 


Transmit (Vj, element (AA:)) to Si 



The following instructions transmit the contents of other registers into 
the V registers. 

Machine 



Instruction 


CAL Syntax 


Description 


OUijk 


Vi,AA: S/ 


Transmit (S/) to Vi element (AA:) 


142iOk 


Vi VA: 


Transmit (Vk) to Vi 


156i0k 


Vi -Vk 


Transmit negative of (VA:) to Vi 
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Transfers to Intermediate Registers 



The following instructions transmit the contents of A and S registers into 
the intermediate B and T registers. 



Machine 






Instruction 


CAL Syntax 


Description 


025ijk 


Bjk Ai 


Transmit (Ai) to Bjk 


075*/* 


Tjk Si 


Transmit (Si) to Tjk 



Transfers to Shared Registers 



The following instructions transmit the contents of other registers into 
the shared registers. 



Machine 






Instruction 


CAL Syntax 


Description 


027i/6 


SB,A/ Ai 


Transmit (Ai) to SB designated 
by (Aft 


027ij7 


SB; Ai 


Transmit (Ai) to shared B; 


073*02 


SM Si 


Transmit (Si) to the semaphore 
registers 


073//3 


ST; Si 


Transmit (Si) to shared T; 


073//6 


ST,A/ Si 


Transmit (Si) to ST designated 
by (40 



Transfers to Status Registers 



The following instructions transfer the contents of an S register into the 
status registers. 



Machine 

Instruction CAL Syntax 

073i05 SR0 Si 

073i75 3 SR7 Si 



Description 

Transmit (Si) to status register 

Transmit (Si) to maintenance mode 
register (status register 7) 
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Transfer to Vector Mask Register 



The following instructions transmit the contents of other registers into 
the vector mask register. 



Machine 
Instruction 

0030/0 



0030;! 1 



CAL Syntax Description 

VM S/ Transmit (S/) to vector mask lower 

Wm = (2127 - 264), lower elements 
0-63 

VM1 S/' Transmit (S/) to vector mask upper 

V/n = (263 - 20), upper elements 
64 - 127 



Transfer to Vector Length Register 



The following instructions transmit the contents of other registers into 
the vector length register. 



Machine 

Instruction CAL Syntax 



00200/: 
002000 



VL Ak 
VL 1 



Description 
Transmit (Ak) to VL 
Transmit 1 to VL 



Memory Transfer Instructions 



The memory transfer instructions enable or disable bidirectional memory 
transfers, transfer data between registers and memory, and ensure 
completion of memory references. 



Bidirectional Memory Transfers 



The following instructions enable or disable bidirectional memory 
transfers. 



Machine 

Instruction CAL Syntax 



002500 



002600 



DBM 



EBM 



Description 

Disable bidirectional memory 
transfers (BDM = 0) 

Enable bidirectional memory 
transfers (BDM = 1) 
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Memory References 



The following instructions ensure completion of instructions for 
bidirectional memory transfers. 



Machine 






Instruction 


CAL Syntax 


Description 


002700 


CMR 


Complete memory references 


002704 


CPA 


Complete port reads and writes 


002705 


CPR 


Complete port reads 


002706 


CPW 


Complete port writes 



The following instructions write values into memory. 



Machine 

Instruction CAL Syntax 

035ijk ,A0 Bjk,Ai 

037ijk ,A0 Tjk,Ai 

11/m'OO nm exp,PJi Ai 

13/m'OO nm exp,Ah Si 

17707* ,A0,M V; 



1770/0 



1771;/: 



,A0, 1 V; 



,A0,V£ V; 



Description 

Write (Ai) words from B register jk 
to memory address ((A0) + (DBA)) 

Write (Ai) words from T register jk 
to memory address ((A0) + (DBA)) 

Write (Ai) to memory address ((Ah) 
+ exp + (DBA)) 

Write (Si) to memory address ((Ah) 
+ exp + (DBA)) 

Write (VL) words from V; to 
memory address ((A0) + (DBA)) 
incremented by (Ak) 

Write (VL) words from V; to 
memory address ((A0) + (DBA)) 
incremented by 1 

Write (VL) words from V; to 
memory address ((A0) + (Vk) + 
(DBA)) (scatter) 
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Reads 



The following instructions read values from memory. 



Machine 

Instruction CAL Syntax 



034ijk 



036ijk 



176/00 



176ilk 



Bjk,Ai ,A0 



Tjk,Ai ,A0 



lOhiOO nm Ai exp,Ah 
12/w'OO nm Si exp,Ah 
176i0k Vi ,A0,M 



Vi ,A0,1 



Vi ,A0,Vk 



Description 

Read (Ai) words to B register//: 
from memory address ((A0) + 
(DBA)) Port A 

Read (Ai) words to T register jk 
from memory address ((A0) + 
(DBA)) Port B 

Read from memory ((Ah) + exp + 
(DBA)) to Ai 

Read from memory address ((Ah) + 
exp + (DBA)) to Si 

Read (VL) words to Vi from 
memory address ((A0) + (DBA)) 
incremented by (Ak) 

Read (VL) words to Vi from 
memory address ((A0 + (DBA)) 
incremented by 1 

Read (VL) words to Vi from 
memory address ((A0 + (Vk) + 
(DBA)) (gather) 



Integer Arithmetic Instructions 



Integer arithmetic operations obtain operands from registers and return 
results to registers. No direct memory references are allowed. 

The assembler recognizes several special syntax forms for increasing or 
decreasing register contents, such as the operands Ai+1 and Ai-1; 
however, these references actually result in register references such that 
the 1 becomes a reference to Ak with k = 0. 

All integer arithmetic is two's complement and is represented as such in 
the registers. The address add and address multiply functional units 
perform 36-bit (Y-MP mode) and 64-bit (C90 mode) arithmetic. The 
scalar add functional unit and the vector add functional unit perform 
64-bit arithmetic. No overflow conditions are detected by functional 
units when performing integer arithmetic. 
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32-bR Integer Arithmetic 



64-bit Integer Arithmetic 



Multiplication of two fractional operands is accomplished using a 
floating-point multiply instruction. The floating-point multiply 
functional unit recognizes conditions in which both operands have zero 
exponents as a special case and returns the high-order 48 bits of the 
result as an unnormalized fraction. Division of integers requires that 
they first be converted to floating-point format and then divided using 
the floating-point functional units. 



The following instructions perform 32-bit integer arithmetic. 



Machine 

Instruction CAL Syntax 



030ijk 
030y0 
031yifc 

031i/0 

032ijk 



Ai Aj + AJc 
Ai AJ + 1 
Ai Aj-AJc 

Ai A/-1 

Ai AJ * Ak 



Description 

Integer sum of (Aj) and (Ak) to Ai 

Integer sum of (Aj) and 1 to Ai 

Integer difference of (Aj) and 
(Ak) to Ai 

Integer difference of (Aj) and 
ltoAi 

Integer product of (Aj) and 
(Ak) to Ai 



The following instructions perform 64-bit integer arithmetic. 



Machine 






Instruction 


CAL Syntax 


Description 


060ijk 


Si S; + SA: 


Integer sum of (S;) and (Sk) to Si 


061ijk 


Si Sj-Sk 


Integer difference of (Sj) and 
(Sk) to Si 


\5Aijk 


Vi Sj + Vk 


Integer sums of (Sj) and (Vk) to \ 


\55ijk 


Vi Vj + Vk 


Integer sums of (V/) and (Vk) to ^ 


156ijk 


Vi Sj-Vk 


Integer differences of (Sj) and 
(Vk)toVi 


151ijk 


Vi V;-V* 


Integer differences of (V/) and 
(Vk)\oVi 



HR-04028 



2-65 



Mainframe 



CRAY Y-MP C90 Functional Description Manual 



Floating-point Arithmetic Instructions 



All floating-point arithmetic operations use registers as the source of 
operands and return results to registers. 

Floating-point numbers are represented in a standard format throughout 
the CPU. This format is a packed representation of a binary coefficient 
and an exponent or power of 2. The coefficient is a 48-bit signed 
fraction. The sign of the coefficient is separated from the rest of the 
coefficient. Because the coefficient is signed magnitude, it is not 
complemented for negative values. Refer to "Floating-point Arithmetic' 
earlier in this section for more information on floating-point numbers 
and arithmetic. 



Floating-point Range Errors 



The following instructions enable or disable floating-point range errors 
to be flagged. 



Machine 

Instruction CAL Syntax 



002100 



002200 



EFI 



DFI 



Description 

Enable FP Interrupt 
(IFP = 1, Clear FPS) 

Disable FP Interrupt 
(IFP = 0, Clear FPS) 



Floating-point Addition and Subtraction 

The following instructions perform floating-point addition or subtraction. 



Instruction 


CAL Syntax 


Description 


062ijk 


Si Sj + FSA: 


Floating-point sum of (S/) and 
(S&) to Si 


062/OA: 


Si +FS£ 


Normalize (Sk) to Si 


063ijk 


Si Sj-FSk 


Floating-point difference of (Sj) and 
(SA:) to Si 



063i0£ Si -FSk Transmit normalized negative of 

(SA:) to Si 

170ijk Vi S; + FVA; Floating-point sums of (Sj) and 

(V£)toVi 
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Machine 

Instruction CAL Syntax 

170i0* Vi +FVk 

lllijk Vi Vj + FVA: 



172ijk 



172i0k 



173*/* 



Vi S; - FVA: 



Vi -FVk 



Vi Vj - FVk 



Description 

Normalize (Vk) to Vi 

Floating-point sums of (Vj) and 
(Vk) to Vi 

Floating-point differences of (Sy) 
and (Vk) to Vi 

Transmit normalized negatives of 
(Vk) to Vi 

Floating-point differences of (Vj) 
and (Vk) to Vi 



Floating-point Multiplication 



The following instructions perform floating-point multiplication. 



Machine 






Instruction 


CAL Syntax 


Description 


064ijk 


Si S/ * FSA: 


Floating-point product of (S/) and 
(SA:) to Si 


065ijk 


Si S; * HSA: 


Half -precision rounded 
floating-point product of (S/) and 
(Sk) to Si 


066ijk 


Si S; * RSA: 


Full-precision rounded 
floating-point product of (S/) and 
(SA:) to Si 


160ijk 


Vi S; * FVA: 


Floating-point products of (S/) and 
(Vk) to Vi 


161ijk 


Vi V; * FV7: 


Floating-point products of (Vj) and 
(Vk) to Vi 


162ijk 


Vi S; * HVA: 


Half-precision rounded 
floating-point products of (S/) and 
(Vk) to Vi 


163ijk 


Vi VJ * HVA: 


Half-precision rounded 



floating-point products of (Vj) and 
(Vk)toVi 
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Machine 
Instruction CAL Syntax 



164*/* 



165ijk 



Vi Sj * RVk 



Vi V; * RVA: 



Description 

Rounded floating-point products of 
(S;)and(VA:)toVi 

Rounded floating-point products of 
(Y/) and (Vk) to Vi 



Reciprocal Approximation 



The following instructions perform floating-point reciprocal 
approximation operations. 



Machine 
Instruction 

070i/0 
174//0 



CAL Syntax Description 

Si /HS/ Floating-point reciprocal 

approximation of (Sj) to Si 

Vi fHVj Floating-point reciprocal 

approximations of (V/) to Vi 



Logical Operation Instructions 



The scalar and vector logical functional units perform bit-by-bit 
manipulation of 64-bit quantities. Logical operations include logical 
products, logical sums, logical exclusive ORs, logical equivalence, 
vector mask, and merges. Logical operations are defined below. 

• A logical product (& operator) is the AND function. 

• A logical exclusive or (\ operator) is the exclusive OR function. 

• A logical sum (! operator) is the inclusive OR function. 

• A logical merge combines two operands depending on a ones mask 
in a third operand. The result is defined by (operand 2 & mask) ! 
(operand 1 & #mask). 
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Logical Products 



The following instructions peform logical product operations. 



Machine 

Instruction CAL Syntax 



044*;* 



Ulijk 



Si Sj&Sk 



044i/0 


Si S/&SB 


044i/0 


Si SB&Sj 


045ijk 


Si #Sk&Sj 


045ij0 


Si #SB&S/ 


UQijk 


Vi Sj&Vk 



Vi Vj&Vk 



Description 

Logical product of (Sj) and (Sk) 
to Si 

Sign bit of (S;) to Si 

Sign bit of (S;) to Si 0" p 0) 

Logical product of (Sj) and one's 
complement of (Sk) to Si 

(Sj) with sign bit cleared to Si 

Logical products of (Sj) and 
(Vk) to Vi 

Logical products of (Vj) and 
(Vk) to Vi 



Logical Sums 



Machine 




Instruction 


CAL Syntax 


051ijk 


Si S;!SA: 


051i/0 


Si S/!SB 



The following instructions perform logical sum operations. 

Description 

Logical sum of (Sj) and (Sk) to Si 

Logical sum of (Sj) and sign bit 
to Si 

Logical sum of (Sj) and sign bit 
to Si (/ * 0) 

Logical sums of (Sj) and (Vk) to Vi 

Logical sums of (Vj) and (Vk) to Vi 



051i/0 

Ulijk 
U3ijk 



Si SB!S; 
Vi S;!VA: 

Vi vyrv* 
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Logical Exclusive ORs 



The following instructions perform exclusive OR operations. 



Machine 






Instruction 


CAL Syntax 


Description 


046ijk 


Si Sj\Sk 


Exclusive OR of (S/) and 
(Sk) to Si 


046i/0 


Si S/ASB 


Toggle sign bit of (S/), then enter 
into Si 


046//0 


Si SB\Sj 


Toggle sign bit of (S/), then enter 
into Si 0' * 0) 


lUijk 


Vi SjWk 


Exclusive ORs of (S/) and 
(Vk) to Vi 


U5ijk 


Vi VjXVk 


Exclusive ORs of (V/) and 
(Vk) to Vi 



Logical Equivalence 



The following instructions perform logical equivalence operations. 

Machine 

Instruction CAL Syntax Description 



047ijk Si #Sj\Sk 

047i/0 Si #SB\S; 

047y0 Si #S;\SB 



Logical equivalence of (S£) and (S/) 
to Si 

Logical equivalence of (S/) and sign 
bit to Si (/ ^ 0) 

Logical equivalence of (S/) and sign 
bit to Si 
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Vector Mask 



The following instructions test the elements of a vector register and use 
the test results to set the corresponding bits of the vector mask register. 



Description 

SetVMbitif(Y/) = 

SetVMbitif(V/)*0 

SetVMbitif(V/)sO 

SetVMbitif(V/)<0 

Set VM bit if (Vj) = 0; also, store 
the compressed indices of the Vj 
elements = in the Vi elements 

Set VM bit if (Vj) * 0; also, store 
the compressed indices of the Vj 
elements * in the Vi elements 

Set VM bit if (Vj) ;> 0; also, store 
the compressed indices of the Vj 
elements ^ in the Vi elements 

Set VM bit if (Vj) < 0; also store the 
compressed indices of the Vj 
elements < in the Vi elements 



Machine 




Instruction 


CAL Syntax 


1750;0 


vm y/,z 


1750/1 


VM V/,N 


1750/2 


vm y/,p 


1750J3 


VM V/,M 


17504 


Vi,VM y/,z 


175i/5 


Vi,VM V/,N 


175i/6 


Vi,VM V/,P 


YlSijl 


Vi,VM VjM 



Merge 



The following instructions perform a logical merge that combines two 
operands according to the bits set in a ones mask in a third operand. 

Machine 

Instruction CAL Syntax Description 

050ijk Si Sj\Si&Sk Logical product of (Si) and (Sk) 

complement ORed with logical 
product of (S;) and (Sk) to Si 

050//0 Si Sj\Si&SB Scalar merge of (Si) and sign of 

(Sj)toSi 
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Machine 

Instruction CAL Syntax 

U6ijk Vi SjlVk&VM 

146*0* Vi #VM&V£ 

Ulijk Vi VjlVk&VM 



Description 

Transmit (S/) if VM bit = 1 or (Vk) 

if 

VMbit = 0toVi 

Vector merge of (Vk) and to Vi 

Transmit (Vj) if VM bit = 1 or (Vk) 
ifVMbit = 0toVi 



Shift Instructions 



The scalar shift functional unit and vector shift functional unit shift 
64-bit quantities or 128-bit quantities. A 128-bit quantity is formed by 
concatenating two 64-bit quantities. The number of bits a value is 
shifted left or right is determined by the value of an expression for some 
instructions and by the contents of an A register for other instructions. If 
the count is specified by an expression, the value of the expression must 
not exceed 64. 

Machine 

Instruction CAL Syntax Description 



052ijk 


SO Si < exp 


Shift (Si) left exp = jk places to SO 


053ijk 


SO Si > exp 


Shift (Si) right exp = 100 8 -jk 
places to SO 


054ijk 


Si Si < exp 


Shift (Si) left exp = jk places to Si 


055ijk 


Si Si > exp 


Shift (Si) right exp = 100 8 -jk 
places to Si 


056ijk 


Si Si,S/ < A& 


Shift (Si) and (Sj) left (A*) places 
to Si 


056//0 


Si Si,S;<l 


Shift (Si) and (S/) left one place 
toSi 


056iOA: 


Si Si<M 


Shift (Si) left (Ak) places to Si 


057ijk 


Si S;,Si>M 


Shift (S;) and (Si) right (AJfc) places 
to Si 



057i/0 



Si S/,Si>l 



Shift (S;) and (Si) right one place 
to Si 
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Machine 




Instruction 


CAL Syntax 


057/OA: 


Si Si>Ak 


150ijk 


Vi Vj < Ak 


150y0 


Vi Vj < 1 


005400 


Vi Vj < V0 


150i/0 




151ijk 


Vi Vj > Ak 


151ij0 


Vi Vj > 1 


005400 


Vi Vj > V0 


151i/0 




152ijk 


Vi Vj,Vj<Ak 


152ij0 


Vi Vj,Vj<l 


005400 


ViVj,Ak 


152ijk 




153ijk 


Vi Vj,Vj>Ak 


153ij0 


Vi Vj,Vj>l 



Description 

Shift (Si) right (M) places to Si 
Shift (Vj) left (Ak) places to Vi 
Shift (Vj) left one place to Vi 
Shift (Vj) left (V0) places to Vi 

Shift (Vj) right (M) places to Vi 
Shift (Vj) right one place to Vi 
Shift (Vj) right (V0) places to Vi 



Double shift (Vj) left (A*) places 
toVi 

Double shift (Vj) left one place 
toVi 

Vector word shift of (Vj) starting at 
element (Ak) to Vi ((Ak)<VL) 

Double shift (Vj) right (Ak) places 
toVi 

Double shift (Vj) right one place 
toVi 



Bit Count Instructions 



Bit count instructions count the number of set bits or the number of 
leading bits in an S or V register. 



Scalar Population Count 



The following instruction performs the scalar population count. 



Machine 

Instruction CAL Syntax 



026i/0 



Ai PS; 



Description 

Population count of (Sj) to Ai 
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Vector Population Count 



The following instruction performs the vector population count. 



Machine 

Instruction CAL Syntax 



174i/l 



Vi PV; 



Description 

Population counts of (Vy) to Vi 



Population Parity Count 



The following instructions perform population parity count. 



Machine 






Instruction 


CAL Syntax 


Description 


026i/l 


A* QS; 


Population count parity of (S/) to Ai 


174i/2 


Vi QVj 


Population count parities of (V/) 



toVi 



Scalar Leading Zero Count 



The following instruction performs the scalar leading zero count. 



Machine 

Instruction CAL Syntax 



027i/0 



Ai ZSj 



Description 

Leading zero count of (S;) to Ai 



Vector Leading Zero Count 



The following instruction performs the vector leading zero count. 



Machine 

Instruction CAL Syntax 



174i/3 



Vi ZV; 



Description 

Leading zero count of (V;) to Vi 



Branch Instructions 



Instructions in this category include conditional and unconditional 
branch instructions. An expression or the contents of a B register specify 
the branch address. An address is always taken to be a parcel address 
when the instruction runs. If an expression has a word-address attribute, 
the assembler issues an error message. 
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Unconditional Branch Instructions 

The following instructions perform unconditional branch operations. 



Machine 
Instruction 


CAL Syntax 


Description 


0050/1 


J Bjk 


Jump to (Bjk) 


0051/* 


Jinv Bjk 


Jump to (Bjk) 
invalidates ins 


006ijkm 2 


J exp 


Jump to exp 


006000 nm 1 


J exp 


Jump to exp 



Conditional Branch Instructions 

The following instructions perform conditional branch operations. 



Machine 

Instruction CAL Syntax 

0064jk nm 1 JTSjk exp 

0064/A:n/n 1 JTS,A& exp 

OlOijkm 2 JAZ exp 

010000 nm 1 JAZ exp 

Ollijkm 2 JAN exp 

011000 ww 1 JAN exp 

Ollijkm 2 JAP exp 

012000 nm 1 JAP exp 

013ijkm 2 JAM exp 

013000 nm 1 JAM exp 

OUijkm 2 JSZ exp 



Description 

Branch to exp if (SMjk) = 1; else set 
SMjk 0*2 = 0) 

Branch to exp if (SM,(A£)) = 1; else 
setSM,(M) 0*2 = 1) 

Jump to exp if (A0) = 0*2 = 0) 

Jump to exp if (A0) = 

Jump to exp if (A0) * 02 = 0) 

Jump to exp if (A0) * 

Jump to exp if (A0) is positive; 
(A0) a: ( il = 0) 

Jump to exp if (A0) is positive; 
(A0) s> 

Jump to exp if (A0) is negative 
0'2 = 0) 

Jump to exp if (A0) is negative 

Jump to exp if (SO) = 0*2 = 0) 
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Machine 

Instruction CAL Syntax 

014000 nm 1 JSZ exp 

015ijkm 2 JSN exp 

015000 nm 1 JSN exp 

016ijkm 2 JSP exp 

016000 nm 1 JSP exp 

017ijkm 2 JSM exp 

017000 nm 1 JSM exp 



Description 

Jump to exp if (SO) = 

Jump to exp if (SO) * (i2 = 0) 

Jump to exp if (SO) * 

Jump to exp if (SO) is positive; 
(SO) ;> (i2 = 0) 

Jump to exp if (SO) is positive; 
(SO) * 

Jump to exp if (SO) is negative 
(i2 = 0) 

Jump to exp if (SO) is negative 



Return Jump 



The following instructions perform a return jump operation. 



Machine 

Instruction CAL Syntax 

OOlijbn 2 R exp 

007000 nm 1 R exp 



Description 

Return jump to exp and set register 
BOO to (P) + 2 

Return jump to exp and set register 
BOO to (P) + 3 



Normal Exit 



The following instruction performs a normal exit operation. 



Machine 

Instruction CAL Syntax 



004000 



EX 



Description 
Normal Exit 
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Error Exit 



The following instruction performs an error exit operation. 

Description 



Machine 

Instruction CAL Syntax 



000000 



ERR 



Error exit 



Monitor Mode Instructions 



Monitor mode instructions are executed only when the CPU is in monitor 
mode. An attempt to execute one of these instructions when not in 
monitor mode is treated as a pass instruction. The instructions perform 
specialized functions useful to the operating system. 



Channel Control 



The following instructions perform channel control operations. 



Machine 
Instruction 

0010/* 3 



0012/1 3 



CAL Syntax Description 

CA,A/ AA: Set channel (A/) CA register to (Ak) 

and begin I/O sequence 



001000 3 


NOP 


Pass 


OOlljk 3 


CL,A/ 


Set channel (A/) CL register to (Ak) 


0012/0 3 


CI,A/ 


Clear channel (A/) interrupt and 
error flags; clear device master clear 
(output channel) 



MC,A/ 



Clear channel (A/) interrupt and 
error flags, set device master clear 
(output channel); clear device 
ready-held (input channel) 



Set Exchange Address 



The following instruction sets the exchange address. 

Description 



Machine 

Instruction CAL Syntax 



0013/0 3 



XAAj 



Enter XA register with (A/) 
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Set Real-time Clock 



The following instruction performs a real-time clock operation. 



Machine 

Instruction CAL Syntax 

0014/0 3 RT S/ 



Description 

Enter RTC register with (S/) 



Set Cluster Number 



The following instruction sets the cluster number. 

Description 



Machine 

Instruction CAL Syntax 



0014/3 3 CLN Ay- 



Transmit (A/) to cluster number 



Programmable Clock Interrupt 



The following instructions perform programmable clock operations. 



Machine 

Instruction CAL Syntax 

0014/4 3 PCI S/ 

001405 3 CCI 

001406 3 ECI 

001407 3 DCI 



Description 

Transmit (S/) to programmable 
clock 

Clear programmable clock interrupt 
(PCI) request 

Enable PCI (MM IPC only) 

Disable PCI (MM IPC only) 



Operand Range Error Interrupt 



The following instructions enable or disable operand range error 
interrupts. 



Machine 






Instruction 


CAL Syntax 


Description 


002300 


ERI 


Enable range interrupt (IOR = 1) 


002400 


DRI 


Disable range interrupt (IOR = 0) 
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Interprocessor Interrupt 



The following instructions perform interprocessor interrupt operations. 



Machine 








Instruction 


CAL 


, Syntax 


Description 


0014/1 3 


SIPI 


A/ 


Set interprocessor interrupt of 
CPU (A/) 


001401 3 


SIPI 




Send interprocessor interrupt 
to CPUO 


001402 3 


CIPI 




Clear interprocessor interrupt 



Breakpoint Interrupt 



The following instructions enable or disable breakpoint interrupts. 



Machine 

Instruction CAL Syntax 



002301 



002401 



EBP 



DBP 



Description 

Enable breakpoint interrupt 
(IBP = 1) 

Disable breakpoint interrupt 
(IBP = 0) 



Performance Counters 



The following instructions operate the performance monitor. 



Instruction 


CAL Syntax 


001500 3 




073/21 3 


Si SR2 


073/25 3 


SR2 Si 


073/31 3 


Si SR3 



Description 

Clear all performance monitor 
Counters 

Read PM counters 00 - 17 and 
increment pointer 

Issue PM maintenance advance 

Read PM counters 20 - 37 and 
increment pointer 
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System Clock 

Speed 4.2 ns 

CPU Specifications 

Number of CPUs 8,16 

Number of registers per CPU: 

Address (A) registers 8 

32 bits each r , 

Intermediate address (B) registers 64 

32 bits each \^ / 
Scalar (S) registers 8 

64 bits each 
Intermediate scalar (T) registers 64 

64 bits each 
Vector (V) registers 8 

64 bits x 128 elements 
(C90 mode) 

64 bits x 64 elements 
(Y-MP mode) 
Vector length (VL) register 1 

8 bits (C90 mode) 

7 bits (Y-MP mode) 
Vector mask (VM, VM1) registers .... 2 

64-bits each 
Program address (P) register 1 

32 bits (C90 mode) 

24 bits (Y-MP mode) 

Number of functional units per CPU: 

Address addition 1 

Address multiplication 1 

Scalar addition 1 

Scalar shift 1 

Scalar logical 1 

Scalar population/parity/leading zero . . 1 

Vector addition 2 

Vector shift 2 

Vector population/parity/leading zero . . 2 

Full vector logical 2 

2nd vector logical 2 

Vector population/parity/leading zero . . 2 

Floating-point addition 2 

Floating-point multiplication 2 

Floating-point reciprocal approx 2 



Shared Resources 

Input/output section: 

2 Input/output subsystems model E (IOS-E) 

6- or 20-Mbyte/s LOSP channels 16 

200-Mbytes/s HISP channels 16 

1800-Mbyte/s VHISP channels 4 

Central memory: 

Word width 64 bits 

SBCDBD error correction 16 bits 

Memory size 256 Mwords 

Number of banks 1,024 

Number of modules 16 

Number of ports per CPU 4 

Number of clusters 17 

Number of shared registers contained in each 
cluster: 

Shared address (SB) registers 8 

32 bits each 

Shared scalar (ST) registers 8 

64 bits each 

Semaphore (SM) registers 32 

1 bit each 

Real-time clock (64-bits) 1 



Physical Description 

Height 85.5 (217 cm) 

Width 116.25 in. (295 cm) 

Depth 101.00 in. (257 cm) 

Weight 16,616 lbs (7,536 kg) 



Support Equipment 

Operator workstation 1 

Maintenance workstation 1 

Heat exchanger units 1 

Refrigeration and condensing unit 1 

Motor-generator sets 1 or 2 



2-82 



HR-04028 



3 I/O SUBSYSTEM 



The Cray Research input/output subsystem model E (IOS-E) controls 
data transfers between several components of the CRAY Y-MP C90 
computer system. The IOS-E transfers data to and receives data from the 
following components. 

The CRAY Y-MP C90 mainframe 

The SSD solid-state storage device model E (SSD-E) 

Peripheral devices such as disk drives and front-end computers 

The operator workstation model E (OWS-E) 

The maintenance workstation model E (MWS-E) 

The IOS-E comprises a maximum of eight clusters and two workstation 
interfaces (WINs). The following subsections describe the I/O clusters 
and WINs. A block diagram of an I/O cluster is provided in Figure 3-1. 



I/O Cluster 



Each I/O cluster contains the following components: 

1 I/O processor multiplexer (IOP MUX) 
4 auxiliary I/O processors (EIOPs) 
16 I/O buffers 

1 low-speed (LOSP) channel pair 

2 high-speed (HISP) channel pairs 
16 channel adapters 

The IOPs (IOP MUX and EIOPs) provide internal control for the I/O 
cluster. The LOSP channels allow the I/O cluster to exchange control 
information with the mainframe. The I/O buffers, HISP channels, and 
channel adapters provide the paths for data transfers between the IOS-E 
and the mainframe, SSD-E, and peripheral devices. 
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Devices 
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A-10394 



I/O Processor 



Figure 3-1. Cluster Channel Connections 



The following subsections describe the components of an I/O cluster. 



The IOPs control all data transfers in to and out of the I/O cluster. The 
IOP MUX communicates with the mainframe and controls data transfers 
to or from the mainframe. The IOP MUX also controls data transfers to 
or from the SSD-E. The four EIOPs control data transfers to or from 
peripheral devices. Each EIOP can communicate with the IOP MUX but 
not with other EIOPs. 
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The IOPs are identical; they have the same architecture and execute the 
same instruction set. Each IOP is a high-speed 16-bit (1 parcel) 
computer designed to efficiently control data transfers. Each IOP 
contains a 64-Kparcel local memory that is protected with SECDED 
(single-error correction/double-error detection) logic. Each IOP also 
contains a 128-parcel operand register file that is parity protected and 
three programmable registers. Each IOP contains 29 I/O channels; 5 of 
the channels monitor and control operations within the IOP, and 24 
channels enable the IOP to communicate with external devices. 

Each IOP executes a set of 128 1-parcel or 2-parcel instructions. 
Ninety-six instructions perform basic operations such as data transfers; 
arithmetic (two's complement), logical, and shift operations; conditional 
and unconditional jumps; and subroutine calls and exits. Thirty-two 
instructions, called I/O functions, control and monitor the I/O channels. 



I/O Buffers 



The 16 I/O buffers provide temporary storage for data transferred to or 
from the IOS-E. Each buffer contains 64 Kwords. Each word is 64 bits 
long and is protected with SECDED logic. Each buffer can 
simultaneously pass data to or from a channel adapter while passing data 
to or from the mainframe or SSD-E. Each buffer is dedicated to one 
peripheral device, or in the case of mass storage devices, to one group of 
identical devices. 

Each EIOP is dedicated to 4 of the 16 I/O buffers. Each EIOP controls 
all transfers between its buffers and peripheral devices. The EIOPs work 
with the IOP MUX to control transfers between the buffer and the 
mainframe or SSD-E. Each EIOP can also transfer data between its I/O 
buffers and its local memory. 



Low-speed and High-speed Channels 



The LOSP and HISP channel pairs enable the I/O cluster to communicate 
with the mainframe and SSD-E. The LOSP channel pair transfers 
control information between the IOP MUX and the mainframe. One 
HISP channel pair transfers data between the I/O buffers and the 
mainframe; the second pair transfers data between the I/O buffers and the 
SSD-E. 

The LOSP channel pair operates at 6 or 20 Mbytes/s. The LOSP 
channels are 16 bits wide and contain 4 parity bits for error detection. 
Each channel can operate simultaneously. 
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Channel Adapters 



CCA-1 Channel Adapter 



The HISP channel pairs operate at 200 Mbytes/s. The HISP channels are 
64 bits wide and contain 8 SECDED bits. Each channel can operate 
simultaneously, but each must use a different I/O buffer. 



Channel adapters enable the I/O cluster to communicate with peripheral 
devices. Several types of channel adapters are available; each type of 
channel adapter enables the I/O cluster to communicate with a different 
type of device. Most channel adapters can be connected to only one 
peripheral device; however, channel adapters for disk storage devices can 
be connected to multiple devices of the same type. 

Each channel adapter corresponds to one I/O buffer. During a data 
transfer from a peripheral device, the channel adapter converts the input 
data from the device's format to 64-bit words, generates SECDED bits, 
and then transmits the data and SECDED bits to the I/O buffer. During a 
data transfer to an external device, the channel adapter receives 64-bit 
data words (plus SECDED bits) from the I/O buffer, converts the data to 
the correct format for that device, and then transmits the data to the 
device. 

Each EIOP controls four channel adapters. The EIOPs control and 
monitor all data transfers between the I/O buffers and peripheral devices. 

The following subsections provide specific information on each type of 
channel adapter. The specification sheet at the end of this section 
provides a quick reference summary of all channel adapter specifications. 



The CCA-1 channel adapter contains a LOSP channel pair that transfers 
data between an I/O buffer and an external device such as a front-end 
interface. The LOSP channel pair consists of an input and an output 
channel. Both channels can operate simultaneously. 

Each CCA-1 can support one external device. The maximum transfer 
rate between the CCA-1 and the external device is 6 or 12 Mbytes/s 
(software controlled). The word width of each transfer is 16 bits. 

All data transfers to or from the CCA-1 are checked for data errors. Data 
transfers between the CCA-1 and either the mainframe or the SSD-E are 
protected by SECDED. Data transfers between the CCA-1 and 
peripheral devices are checked for parity errors. 
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DCA-1 Channel Adapter 



DCA-2 Channel Adapter 



The DCA-1 disk channel adapter transfers data between the I/O buffer 
and a disk drive. The DCA-1 disk channel adapter is compatible with 
the DD-40, DD-41, and DD-49 disk drives. 

Each DCA-1 can support one DD-49 disk drive, two DD-40 disk drives, 
or two DD-41 disk drives. The maximum transfer rate between the 
DCA-1 and the disk drive is 12 Mbytes/s. The word width of each 
transfer is 16 bits. 

All data transfers to or from the DCA-1 are checked for data errors. Data 
transfers between the DCA-1 and either the mainframe or the SSD-E are 
protected by SECDED. Data transfers between the DCA-1 and disk 
drives are checked for parity errors. 



The DCA-2 disk channel adapter transfers data between the I/O buffer 
and a disk drive. The DCA-2 disk channel adapter is compatible with 
DD-60 and DD-61 disk drives. 

Each DCA-2 can support up to eight DD-60 or DD-61 disk drives. The 
maximum transfer rate between the DCA-2 and the disk drive is 
24 Mbytes/s. The word width of each transfer is 16 bits. 

All data transfers to or from the DCA-2 are checked for data errors. Data 
transfers between the DCA-2 and either the mainframe or the SSD-E are 
protected by SECDED. Data transfers between the DCA-2 and disk 
drives are checked for parity errors and cyclical redundancy check 
(CRC) errors. 



HCA-3 and HCA-4 Channel Adapters 



The HCA-3 and HCA-4 channel adapters enable the IOS-E to 
communicate with external devices that use a High Performance Parallel 
Interface (HIPPI) channel. The HIPPI channel pair consists of an input 
and output channel. Refer to the "High Performance Parallel Interface 
(HIPPI)" subsection in Section 5 of this manual for more information on 
the HIPPI channel. The input channel is connected to HCA-3; the output 
channel is connected to HCA-4. 

The HIPPI channel can provide high-speed communications between 
Cray Research computer systems. The HIPPI channel also enables a 
Cray Research computer system to be connected to peripheral equipment 
such as network adapters and graphic display devices. 
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TCA-1 Channel Adapter 



Each HCA-3 or HCA-4 channel adapter can support one external HIPPI 
device. The maximum transfer rate between the HCA-3 or HCA-4 and 
the external device is 100 Mbytes/s. The word width of each transfer is 
32 bits. 

All data transfers to or from the HCA-3 are checked for data errors. Data 
transfers from the peripheral equipment to the HCA-3 are checked for 
parity errors and length/longitudinal redundancy check (LLRC) errors. 
Data transfers from the HCA-3 to the mainframe or SSD-E are protected 
by SECDED. 

All data transfers to or from the HCA-4 are checked for data errors. Data 
transfers from the mainframe or the SSD-E to the HCA-4 are protected 
by SECDED. Data transfers from the HCA-4 to the peripheral 
equipment are checked for parity errors and LLRC errors. 



The TCA-1 tape channel adapter transfers data between the I/O buffer 
and tape controllers. The TCA-1 channel adapter supports IBM 
compatible tape controllers. 

Each TCA-1 can support up to eight controllers. The maximum number 
of tape drives each controller supports varies with different tape drive 
models. The maximum transfer rate between the TCA-1 and a tape drive 
will also vary with different tape drive models. The word width of each 
transfer is 8 bits. 

All data transfers to or from the TCA-1 are checked for data errors. Data 
transfers between the TCA-1 and either the mainframe or the SSD-E are 
protected by SECDED. Data transfers between the TCA-1 and 
peripheral devices are checked for parity errors. 



Workstation Interfaces 



The two WINs communicate with the workstations: one WIN connects 
to the operator workstation; the second WIN connects to the maintenance 
workstation. Each WIN has a 6-Mbyte/s channel pair; each pair consists 
of an input and an output channel. Each channel contains 16 data bits 
and 4 parity bits for data error detection. 



The workstations communicate with the entire IOS-E through the WINs. 
Each workstation can send WIN commands that affect the entire IOS-E, 
a single I/O cluster, or a single I/O processor. The workstations can 
master clear the entire IOS-E, an individual I/O cluster, or an individual 
I/O processor. They can also transfer data to or from any IOP, deadstart 
an IOP, and monitor IOPs for errors. 
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The IOPs can send requests to the WINs to transfer data to or from a 
workstation. However, a workstation receiving a request must send a 
specific command to the requesting IOP before the transfer can begin. 



PINT 



The Programmable Interrupt (PINT) enables any IOC in the IOS-E to 
interrupt any CPU in the mainframe. The PINT receives a 16-bit 
Programmed Interrupt signal from each IOC. Each bit in the 
Programmed Interrupt signal corresponds to a CPU within the 
mainframe. For example, setting bit 2 1 interrupts CPU 1 in the 
mainframe. 



HR-04028 3-7 



IOS Model E Specifications 







iBililiiliMil 
"i^'' .* .■'.-•••- ^ 



ilglllfllB 

/ x , ':■•' 'K* ' 

ttk*'. i-- ' : :.-JI 



RESEARCH. INC 



HR-04028 



3-9 



IOS-E Specifications 



General Specifications 

I/O clusters 2 to 8 

Workstation interfaces 2 

Clock speed 160 MHz (6.25 ns)' 



I/O Cluster 

I/O processors: 

IOPMUX 1 

EIOP 4 

Word width 16 bits (1 parcel) 

Local memory size 64 Kparcels 

I/O buffers 16 

Word width 64 bits 

Size 64 Kwords 

(256 Kwords in development) 

I/O channels (CPU connection): 

Low-speed channel pair 1 

Operation full duplex 

Channel width 16 bits (lparcel) 

Transfer rate 6 Mbytes/s 

Data protection parity 

High-speed channel pairs 2 

Operation full duplex 

Channel width 64 bits 

Transfer rate 200 Mbytes/s 

Data protection SECDED 

Channel adapters 16 



Channel Adapters 

CCA-l: 

Operation full duplex 

Word width 

I/O buffer side 64 bits 

External device side 16 bits 

Transfer rate 6 or 12 Mbytes/s 

Data protection 

Mainframe to CCA-l SECDED 

CCA-l to external interface . . parity 
Associated peripheral 

devices front-end interfaces 

Maximum number of peripheral devices 
per CCA-l 1 



Channel Adapters (continued) 

DCA-l: 

Operation half duplex 

Word width: 

I/O buffer side 64 bits 

External device side 16 bits 

Transfer rate 12 Mbytes/s 

Data protection: 

Mainframe to DCA-l SECDED 

DCA-l to external interface . . parity 
Associated peripheral 

devices DD-40, DD-41 

and DD-49 disk drives 
Maximum number of peripheral 

devices per DCA-l one DD-49, 

two DD-40s, 
or two DD-41s 

DCA-2: 

Operation half duplex 

Word width: 

I/O buffer side 64 bits 

External device side 16 bits 

Transfer rate 24 Mbytes/s 

Data protection: 

Mainframe to DCA-2 .... SECDED 
DCA-2 to external 

interface parity and CRC 

Associated peripheral devices DD-60 and 
DD-61 disk drives 
Maximum number or peripheral 

devices per DCA-2 eight DD-60s 

or eight DD-61s 

HCA-3 (HIPPI input channel): 

Operation simple duplex 

Word width: 

I/O buffer side 64 bits 

External device side 32 bits 

Transfer rate 100 Mbytes/s 

Data protection: 

Mainframe to HCA-3 .... SECDED 
HCA-3 to external interface . . parity 
Associated peripheral 

devices All HIPPI devices 

Maximum number of peripheral 

devices per HCA-3 1 
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Channel Adapters (continued) 

HCA-4 (HIPPI output channel): 

Operation simple duplex 

Word width: 

I/O buffer side 64 bits 

External device side 32 bits 

Transfer rate 100 Mbytes/s 

Data protection: 

Mainframe to HCA-4 SECDED 

HCA-4 to external interface . . parity 
Associated peripheral 

devices all HIPPI devices 

Maximum number of peripheral 

devices per HCA-4 1 

TCA-1: 

Operation simple duplex 

Word width: 

I/O buffer side 64 bits 

External device side 8 bits 

Transfer rate 1.5 to 4.5 Mbytes/s 



Data protection: 

Mainframe to TCA-1 SECDED 

SSD to TCA-1 SECDED 

TCA-1 to external device .... parity 
Associated peripheral 

devices IBM compatible tapes 

Maximum number of tape 
controllers per TCA-1 8 



Physical Description of an 
IOS Chassis 

Height 76.25 in. (194 cm) 

Width 46 in. (117 cm) 

Depth 73.5 in. (187 cm) 

Weight 7,695 lbs (3,182 kgs) 

Floor loading . . 520 lbs/ft 2 (2,538 kg/m 2 ) 
Access requirements 3 ft (0.9 m) 

on all sides 
Heat dissipation to air 3.15 kW 

(maximum) 
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4 SSD SOLID-STATE STORAGE DEVICE 



SSD-E Memory 



The Cray Research SSD solid-state storage device model E (SSD-E) is 
an optional high-performance device used for temporary data storage. 
The SSD-E functions like a disk drive. Because of its fast access time, 
fast transfer rates, and large storage capacity, the SSD-E enhances the 
performance of a Cray Research computer system by significantly 
reducing I/O processing time. The storage medium in an SSD-E is 
solid-state, dynamic random access memory (DRAM) chips rather than 
magnetic film. The transfer rates to and from the SSD-E are 
considerably faster than that of a disk drive. Data sets for the SSD-E are 
identical to data sets for disk drives, providing portability and flexibility. 

The mainframe, I/O subsystem model E (IOS-E), and maintenance 
workstation model E (MWS-E) can transfer data to or receive data from 
the SSD-E. The SSD-E can only respond to transfer requests from these 
devices; the SSD-E cannot initiate a transfer. 

The following subsections define the SSD-E memory, mainframe and 
SSD-E transfers, IOS-E and SSD-E transfers, and MWS-E and SSD-E 
transfers. A specification sheet is included at the end of this section. 



Each word contains 72 bits: 64 data bits and 8 check bits. The number of 
words varies, ranging from 128 Mwords to 2 Gwords. Table 4-1 lists the 
different memory sizes. 



Table 4-1. SSD-E Memory Sizes 



Model 


Memory Size 


SSD-128 


1 28 Mwords 


SSD-256 


256 Mwords 


SSD-512 


512 Mwords 


SSD-1024 


1 Gword 


SSD-2048 


2 Gwords 
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To protect data, SSD-E memory uses single-error correction/double-error 
detection (SECDED t) logic. 

When data is written into SSD-E memory, the SECDED logic generates 
a checkbyte (an 8-bit Hamming code) for each data word. The 
checkbyte and 64-bit data word are stored in the SSD-E memory. 

When a word is read from SSD-E memory, a new checkbyte is generated 
for the data word. The new checkbyte is compared to the stored 
checkbyte. The result of the comparison is called the syndrome code. If 
the syndrome code equals 0, no data bits were altered, and the word is 
passed on to the external device. 

If an error occurred, the SECDED logic analyzes the syndrome code to 
determine the number of altered data bits. If only a single data bit was 
altered, the SECDED logic toggles the bit to the correct state and passes 
the corrected word out to the external device. If two data bits were 
altered, the SECDED logic cannot correct the word, but it can detect the 
failure. If more than 2 bits are in error, the results are unpredictable. A 
message is sent to the error logger for all detected errors. 



SSD-E and Mainframe Data Transfers 

Data transfers between the SSD-E and the mainframe's central memory 
use very high-speed (VHISP) channels. Each VHISP channel 
simultaneously transfers two 64-bit words plus two 8-bit checkbytes. 
Each VHISP channel transfers data at the rate of 1,800 Mbytes/s. 

The quantity of VHISP channels can range from one to four, depending 
on the quantity of mainframe CPUs. Typically, there is one VHISP 
channel for each CPU pair. The CRAY Y-MP C90 can support eight 
VHISP channels with 16 CPUs. Because an SSD-E can support 
4 VHISP channels, a second SSD-E is required to use all eight VHISP 
channels from the mainframe. 

To protect data, all VHISP channels use SECDED logic. This SECDED 
logic operates the same as the SECDED logic on the SSD-E memory. 

Data transfers between the mainframe and the SSD-E are done in 
64-word blocks. Individual words are not accessible by the mainframe. 
To read a particular word, an entire block is transferred and the word 
must be selected using software methods similar to disk storage data 
handling methods. 



* Hamming, R. W. "Error Detection and Correcting Codes." Bell System Technical 
Journal. 29.2 (1950): 147-160. 
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All VHISP channels operate under mainframe program control. 
Programming a data transfer requires three parameters: the SSD-E 
starting address, the mainframe's central memory starting address, and a 
block length. The block length specifies how many 64-word blocks to 
transfer. The maximum block length is 16,777,216 which yields a 
maximum transfer length of 1,073,741,824 words. 



SSP-E and IOS-E Data Transfers 

Data transfers between the SSD-E and an I/O buffer in the IOS-E use 
high-speed (HISP) channel pairs. Each HISP channel pair consists of an 
input and output channel, both of which may be active simultaneously. 
Each channel is 64 bits wide and contains 8 check bits. Each channel 
transfers data at the rate of 200 Mbytes/s. 

The quantity of HISP channel pairs ranges from two to eight. Typically, 
there is one HISP channel pair for each I/O cluster in the IOS-E. 

To protect data, all HISP channels use SECDED logic. This SECDED 
logic operates the same as the SECDED logic on the SSD-E memory. 

Data transfers between the IOS-E and the SSD-E are done in 64-word 
blocks. Individual words are not accessible by the IOS-E. To read a 
particular word, an entire block is transferred and the word must be 
selected using software methods similar to disk storage data handling 
methods. 

All HISP channel pairs operate under IOS-E program control. 
Programming a data transfer requires three parameters: the SSD-E 
starting address, the selected I/O buffer's starting address, and a block 
length. The block length specifies how many 64-word blocks to transfer. 
The maximum block length is 1,024, which yields a maximum transfer 
length of 65,536 words. 
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General Specifications 

Storage word size 72 bits 

Data block size 64 decimal words 

per transfer 

Maximum block operation: 

16,777,215 decimal blocks (77777777 8 ) 

for VHISP channels 
256 decimal blocks (377 8 ) 

for HISP channels 

Storage capacity: 

128 Mwords - 1 Mbit DRAM 
256 Mwords - 1 Mbit DRAM 
512 Mwords - 1 Mbit DRAM 

1 Gword - 4 Mbit DRAM 

2 Gword - 4 Mbit DRAM 

Memory configurations: 

128 Mword - 1 section, 4 groups, 

1 Mbit DRAMs 
256 Mword - 2 section, 8 groups, 

1 Mbit DRAMs 
512 Mword - 4 section, 16 groups, 

1 Mbit DRAMs 

1 Gword - 2 section, 8 groups, 
4 Mbit DRAMs 

2 Gword - 4 section, 16 groups, 
4 Mbit DRAMs 

Maximum band width: 

100 Gbits/s reading and writing one word 
per group/clock period 

User ports: 

4 VHISP channel pairs 1,800 Mbytes/s 

each 
8 HISP channel pairs 200 Mbytes/s each 

Data protection: 

Single-error correction/double-error 
detection (SECDED) before and after 
storage and on all user channels. 



Physical Description an SSD-E 
Chassis 

Height 76.25 in. (194 cm) 

Width 46 in. (117 cm) 

Depth 73.5 in. (187 cm) 

Weight 7,695 lbs (3,182 kgs) 

Floor loading . . 520 lbs/ft 2 (2,538 kg/m 2 ) 
Access requirements 3 ft (0.9 m) 

on all sides 
Heat dissipation to air 3.15 kW 

(maximum) 
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The following subsections describe the major components of the disk 
drives and various network interfaces used with the CRAY Y-MP C90 
computer system. 



Disk Controller Units and Disk Storage Units 



Disk systems provide long-term data storage for a Cray Research 
computer system. Components of the disk system include disk channel 
adapters and disk drives. DCA-1 disk channel adapters control one or 
two DD-40 or DD-41 disk drives. DCA-2 disk channel adapters control 
from one to eight DD-60 or DD-61 disk drives. Refer to Section 3 of 
this manual for more information on channel adapters. 



Disk Drives 



DD-60 Disk Drive 



The disk drives store data on magnetic disks. Typically, a disk drive 
consists of several rotating platters. Data is accessed by read and write 
heads organized into groups. Heads are controlled and positioned by one 
or more head actuator (servo) mechanisms on the disk cylinders. The 
following subsections describe specific disk drives. 



One DD-60 disk drive consists of a single-sealed head disk assembly 
(HDA). The HDA contains 2 sets of 9 parallel read and write heads, 1 
servo head, 11 eight-inch rotating platters, and 20 thin film media 
surfaces. 

One set of nine parallel heads is used at a time for data transfers. Eight 
of the heads are used for data and the ninth head is used for parity. The 
heads can be positioned over 2,608 user-accessible locations on the 
surface of each platter. Each location is called a cylinder. The DD-60 
determines which cylinder the heads are on by reading the information 
under the servo head. 
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When the heads are stationary, the area under one head during one 
complete revolution of the platter is called a track. The DCA-2 disk 
channel adapter combines eight tracks of data (one from each head) into 
one logical track. A logical track contains 23 sectors where data can be 
stored and from which data can be retrieved by the operating system. 

Data in one sector is called a data block. One data block consists of 
2,048 64-bit words of IOP data plus verification and error-correcting 
data. Data is transferred between the disk surface and I/O buffer in the 
IOP in blocks of this fixed size. 

One DE-60 disk enclosure cabinet contains a maximum of nine DD-60 
disk drives. Eight of the disk drives store system data, and the ninth disk 
drive is a spare. The DCA-2 disk channel adapter in the IOP manages 
control signals and protocol for the individual disk drives in a DE-60. 

The DCA-2 performs the following functions: 



Controls up to eight DD-60 disk drives (daisy chain configuration) 

Passes control functions to the selected drives 

Receives status from the drives 

Generates codes for correcting write data errors 

Checks read data correcting codes and corrects read data when 
necessary 



Initially, a factory flaw table is used to locate media flaws on the surface 
of a platter. If additional flaws are found, diagnostic programs determine 
the location and width of the flaw. These flaws, which are identified 
during surface analysis, are avoided during read and write operations. A 
defect parameter in the sector ID field contains information on the 
location of the flaw. 

Under control of a DCA-2, a DD-60 writes data into a flawed sector until 
the media defect location is reached. While the read and write head of 
the DD-60 is over the media defect, it writes a copy of the previous 18 
bytes of data. Then the DD-60 resumes writing valid data in the flawed 
sector. 

When reading data from a flawed sector, a DD-60 reads the defect 
address to find the beginning of the 18-byte field of repeated data. When 
the read and write head of the DD-60 reaches the field of repeated data, 
the DCA-2 does not accept the repeated data. The drive resumes its 
normal read operation after the head passes the defect field. 
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DD-60 Single-port Configuration 



A single-port configuration connects one DCA-2 to one disk drive. In 
this configuration, the channel accesses information at the maximum data 
transfer rate of the disk drive. Because only one disk drive connects to 
the channel, the storage capacity of the channel is the storage capacity of 
the disk drive. 

Figure 5-1 shows eight disk drives, each connected in a single-port 
configuration. One DCA-2 connects to the input of port A, and a 
terminator connects to the output of port A for each disk drive. Port B is 
not used. 
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Figure 5-1. DD-60 Single-port Configurations 
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DD-60 Daisy Chain Configuration 



A DD-60 daisy chain configuration (refer to Figure 5-2) consists of a 
maximum of eight DD-60 disk drives connected to one channel. The 
storage capacity per channel is multiplied by the number of drives 
connected in the daisy chain; however, only one DD-60 can transfer data 
to the DCA-2 at a time. 
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Figure 5-2. DD-60 Daisy Chain Configuration 
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DD-60 Alternate-path Configuration 



An alternate-path configuration connects two DCA-2s to a maximum of 
eight disk drives. In this configuration, the two DCA-2s connect to the 
same set of disk drives on two separate daisy chains. Special software 
modifications must be made when the disk drives are cabled in a 
redundant configuration. 

Figure 5-3 shows eight disk drives connected in redundant 
configurations. Each disk drive has one DCA-2 connected to port A 
(primary path) and another DCA-2 connected to port B (redundant path). 



DE-60 
Rear View 




t Input/Output Processor Multiplexer 



A-10121 



Figure 5-3. DD-60 Alternate-path Configurations 
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DD-61 Disk Drive 



The DD-61 disk drive is a 19-head serial disk drive similar to the DD-60. 
During data transfers to and from the DCA-2, one head transfers data at a 
time. The DD-61 has a sustained transfer rate of 2.6 Mbytes/s and a 
storage capacity of 2.23 Gbytes. 

One DD-61 disk drive includes a sealed HDA. The HDA contains 
19 serial read and write heads, 1 servo head, 11 eight-inch rotating 
platters, and 20 thin film media surfaces. 

The heads can be positioned over 2,608 user-accessible locations on the 
surface of each platter. Each location is called a cylinder. The DD-61 
determines which cylinder the heads are on by reading the information 
under the servo head. 

When the heads are stationary, the area under one head after one 
complete revolution of the platter is called a track. A track contains 11 
sectors where data can be stored and from which data can be retrieved by 
the operating system. 

Data in one sector is called a data block. One data block consists of 512 
64-bit words of IOP data plus verification and error-correcting data. 
Data is transferred between the disk surface and the I/O buffer in the IOP 
in blocks of this fixed size. Sectors may be chained during both read and 
write operations. 

One DE-60 disk enclosure cabinet contains a maximum of eight DD-61 
disk drives. The DCA-2 disk channel adapter in the IOP manages 
control signals and protocol for the individual disk drives in a DE-60. 

The DCA-2 performs the following functions: 

• Controls a maximum of eight DD-61 disk drives (daisy chain 
configuration) 

• Passes control functions to the selected drives 

• Receives status from the drives 

• Generates codes for correcting write data errors 

• Checks read data correcting codes and corrects read data when 
necessary 
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Initially, a factory flaw table is used to locate media flaws on the surface 
of a platter. If additional flaws are found, diagnostic programs determine 
the location and width of the flaw. These flaws, which are identified 
during surface analysis, are avoided during read and write operations. A 
defect parameter in the sector ID field contains information on the 
location of the flaw. 

Under control of a DCA-2, a DD-61 writes data into a flawed sector until 
the media defect location is reached. While the read and write head of 
the DD-61 is over the media defect, it writes a copy of the previous 18 
bytes of data. Then the DD-61 resumes writing valid data in the flawed 
sector. 

When reading data from a flawed sector, a DD-61 reads the defect 
address to find the beginning of the 18-byte field of repeated data. When 
the read and write head of the DD-61 reaches the field of repeated data, 
the DCA-2 does not accept the repeated data. The drive resumes its 
normal read operation after the head passes the defect field. 

DD-61 disk drives also connect in the same single-port, daisy chain, or 
alternate-path configurations as DD-60s. For information on the daisy 
chain and alternate-path configurations, refer to the "DD-60 Disk Drive" 
subsection in this section. 



DS-41 Disk Subsystem 



The DS-41 disk subsystem consists of the the DC-41 disk controller and 
the DD-41 disk drive. Each DD-41 disk drive has four spindles that 
operate as a single logical disk drive unit under the control of one DC-41 
disk controller. Each physical disk drive (spindle) consists of 9 rotating 
platters and 15 recording surfaces. Data is accessed by 15 read and write 
heads. A servo mechanism, which controls the read and write heads, 
positions the heads over one of 1,635 disk cylinders. Data is stored and 
retrieved from the recording surface of the disk drive by any of the 15 
read and write heads. 

The recording surface available to each head is called a disk track, which 
is the basic storage unit reserved by the operating system. Each disk 
track has 48 sectors where data can be stored and retrieved by the 
operating system. The data in one sector is called a data block. One data 
block consists of 2,048 16-bit parcels (512 64-bit words) of IOP data 
plus verification and error-correcting data. Data can be transferred 
between the disk surface and local memory in the IOP only in blocks of 
this fixed size. Sectors may be chained for both read and write 
operations. 
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A DC-41 disk controller provides interface logic to adapt DCA-1 signals 
and protocol for individual disk drive units, to handle routing among the 
drives, and to buffer data from the four spindles in a full-track buffer. 
The interface logic in one DC-41 disk control unit performs the 
following functions: 



Controls up to 8 spindles (two DD-41 disk drives) 

Passes control functions to the selected drives 

Passes status from the drives to the DCA-1 

Buffers read and write data for transfers between DCA-1 and the 
disk drives 

Generates error-correcting codes for write data 

Checks read data correcting codes and corrects read data when 
necessary 

Controls distribution of read and write data over 48 sectors per 
track using 12 sectors from each of the four spindles in a logical 
drive 



Under the control of a DC-41, a disk drive writes data to a flawed sector 
until a defect location is reached. In the area starting at a defect address, 
a disk drive writes a 16-byte field of O's. A disk drive resumes writing 
data in a flawed sector following this field of O's. When reading data 
from a flawed sector, a disk drive reads the defect address to find where 
the field of 16 bytes of O's starts. When a drive's read and write head 
reaches the field of O's, the head skips over the flawed area of the sector 
overwritten by the field of O's, omitting them from the read data. The 
drive resumes its normal read operation after the head passes the defect 
field. 

A factory flaw table is used initially; if any additional flaws are found, 
diagnostic programs determine where the flaw is located in the sector 
and how wide it is. Defective areas of the recording surface, which are 
identified during surface analysis, are avoided during read and write 
operations. A defect parameter becomes part of the sector ID field when 
the drive is formatting. 
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DS-41 Single-port Configuration 



A single-port configuration connects one DC-41 to one DD-41. In this 
configuration, the channel accesses information at the maximum 
data-transfer rate of the DD-41. Because only one disk drive connects to 
the channel, the storage capacity of the channel is the storage capacity of 
the disk drive. 



DS-41 Daisy Chain Configuration 



A daisy chain configuration connects one DC-41 to two DD-41s. The 
channel data-storage capacity is the total storage capacity of both disk 
drives. Because only one disk drive can transfer data to the DC-41 at a 
time, the channel data transfer rate is the maximum transfer rate of one 
disk drive. 



DS-41 Alternate-path Configuration 



An alternate-path configuration connects two DC-41s to a maximum of 
two disk drives. In this configuration, the two DC-41s connect to the 
same disk drives on two separate daisy chains. Special software 
modifications must be made when the disk drives are cabled in an 
alternate-path configuration. 



DS-41 A Disk Subsystem Field-upgradable Configurations 



A field-upgradable DS-41A disk subsystem configuration has the 
following components: 

• One disk drive cabinet (DE-41) 

• One DD-41 disk drive (housed in the DE-41) 

• One disk controller cabinet (DCC-2A) 

• One DC-40 disk controller (housed in the DCC-2A) 

A DS-41A can be upgraded by adding a DS-41B package. A DS-41B 
consists of one DD-41, one DC-41, and all cabling required to install the 
additional drive and controller in a DS-41A disk subsystem. Up to three 
DS-41Bs can be installed in a DS-41A disk subsystem. 



DS-40 Disk Subsystem 



The DS-40 disk subsystem comprises the following components: the 
DD-40 disk drive, the DC-40 disk control unit (DCU), and the disk 
controller cabinet (DCC-2). The DD-40 contains four disk drives and 
required interface logic to operate as a single disk drive unit. The DC-40 
is housed in the DCC-2, which is separate from the DD-40 disk drives. 
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Refer to "DS-40 and DS-40D Disk Subsystem Specifications" at the end 
of this section for exact configuration information. Each physical disk 
drive (spindle) consists of six rotating platters and ten recording surfaces. 
Data is accessed by 19 read and write heads that are controlled and 
positioned by a servo mechanism to one of 1,418 disk cylinders. 

The recording surface available to each head is called a disk track, which 
is the basic storage unit reserved by the operating system. Each disk 
track has 48 sectors where data can be recorded and read. The data in 
one sector is called a data block. One data block consists of 2,048 16-bit 
data parcels (512 64-bit words) plus verification and error-correcting 
data. Data can be transferred between the disk surface and I/O buffer in 
the IOP only in blocks of this fixed size. Sectors may be chained for 
both read and write operations. 

Interface logic in the DC-40 also adapts the DCA-1 signals and protocol 
to the individual disk drive units, manages routing among the drives, and 
buffers the data from the four drives in a full-track buffer. 

The interface logic in one DC-40 disk control unit performs the 
following functions: 



Controls up to 8 spindles (two DD-40 disk storage units) 

Passes control functions to the selected drives 

Passes status from the drives to the DCA-1 

Buffers read and write data for transfers between DCA-1 and the 
disk drives 

Generates error-correcting codes for write data 

Checks read data correcting codes and corrects read data when 
necessary 

Controls distribution of read and write data over 48 sectors per 
track using 12 sectors from each of the four spindles in a logical 
drive 



Under the control of a DC-40, a disk drive writes data onto a flawed 
sector until a defect location is reached. In the area starting at a defect 
address, a disk drive writes a 16-byte field of 0's. A disk drive resumes 
writing data in a flawed sector following this field of 0's. When reading 
data from a flawed sector, a disk drive reads the defect address to find 
where the field of 16 bytes of 0's starts. When a drive's read and write 
head reaches the field of 0's, the head ignores the field of 0's, omitting 
the field of 0's from the read data. The drive resumes its normal read 
operation after the head passes the defect field. 
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A factory flaw table is used initially; if any additional flaws are found, 
diagnostic programs determine where the flaw is located in the sector 
and how wide it is. Defective areas of the recording surface, which are 
identified during surface analysis, are avoided during read and write 
operations. A defect parameter becomes part of the sector ID field when 
the drive is formatting. 



DS-40 Single-port Configuration 



A single-port configuration connects one DC-40 to one DD-40. In this 
configuration, the channel accesses information at the maximum 
data-transfer rate of the DD-40. Because only one disk drive connects to 
the channel, the storage capacity of the channel is the storage capacity of 
the disk drive. 



DS-40 Daisy Chain Configurations 



A daisy chain configuration connects one DC-40 to two DD-40s. The 
channel data-storage capacity is the total storage capacity of both disk 
drives. Because only one disk drive can transfer data to the DC-40 at a 
time, the channel data-transfer rate is the maximum transfer rate of one 
disk drive. 



DD-49 Disk Drive 



The DD-49 disk drive consists of nine rotating platters. Data is accessed 
by 32 read and write heads organized into eight groups, four read and 
write heads per group. Heads are controlled and positioned by two 
identical head actuator (servo) mechanisms to one of 886 disk cylinders. 
The servo mechanisms are identified as Servo-A and Servo-B. 

The recording surface available to each head group is called a disk track, 
and is the basic storage unit reserved by the operating system. Each disk 
track has 42 sectors (and two spare sectors) where data is recorded and 
read back. The data in one sector is called a data block and consists of 
2,048 16-bit parcels (512 64-bit words) of IOP data plus verification and 
errorl-correction data. Data can be transferred between the disk surface 
and I/O buffer in the IOP only in blocks of this fixed size. Sectors may 
be chained for both read and write operations. 

The DD-49 disk drive responds to commands from the IOS-E through a 
microprocessor unit card (MPU card) that contains a 68000-type 16-bit 
microprocessor and a second processor called the supervisor. 

The DD-49 disk drive provides a sector-slipping mechanism that allows 
a full track to remain available to the system even after one or two 
sectors of the track become flawed. Sectors are slipped from the flawed 
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sector to the end of the track. In general, if sector n becomes flawed, 
sectors n through 41 of the track are slipped, and the data contained in 
sectors n through 41 must be re-created. If a second sector in a track 
becomes flawed, the process is repeated. If a third sector in a track 
becomes flawed, the operating system must mark the sector as 
unavailable. Sector slipping takes place offline. A hardware diagnostic 
utility reformats the track that contains slipped sectors. 

A DD-49 disk drive has 44 sectors per track, although only 42 sectors are 
used for data. Under normal circumstances, the two spare sectors are not 
used. If one of the data sectors becomes flawed, however, a spare sector 
is used as a data sector. 

Refer to "DD-49 Disk Drive Specifications" at the end of this section for 
configuration information. 



Network Interfaces 



The CRAY Y-MP C90 computer system can be connected to a wide 
variety of computer systems (often referred to as "front-end systems") 
and networks through a CCA-1 channel adapter in the IOS. This enables 
users of non-Cray Research computer systems to utilize the 
CRAY Y-MP C90 system's extraordinary computational power. The 
following subsections describe the methods used to interface the 
CRAY Y-MP C90 computer system with other computer systems and 
networks. 



FEI-1 Front-end Interface 



The FEI-1 front-end interface provides communication between a 
CCA-1 channel adapter in the IOS and many different types of front-end 
computer systems. The FEI-1 compensates for differences in channel 
widths, machine word size, electrical logic levels, and control protocols. 
Refer to "Front-end Interface Specifications" at the end of this section 
for a complete list of compatible mainframes and minicomputers. 

The FEI-1 is housed in a stand-alone cabinet located near the host 
computer. The cabinet is air cooled and operates directly from the AC 
power mains; power consumption varies with each type of interface. 
Internal power supplies provide all required voltages. Cabinet grounding 
is flexible and can be configured to specific site requirements. 

Each FEI-1 contains two or more logic modules and the appropriate 
cabling. The hardware logic contained in these modules performs all 
command translation and protocol conversion needed to transfer data; 
these operations are invisible to both the front-end and Cray Research 
programmer. 
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Fiber-optic Link 



The Cray Research fiber-optic link (FOL) is used as a channel extender 
for 6-Mbyte/s (16-bit asynchronous) channels. It connects the 
conventional wire cable from the CCA-1 channel adapter in the IOS to 
the wire cable from an FEI-1. Fiber-optic cabling enhances the 
performance of the FEI-1 by eliminating the occasional problems related 
to system isolation, including induced noise, variable ground potentials, 
and radio frequency radiation found in wire cabling. Fiber-optic cabling 
overcomes these problems and, in addition, provides a secure link for 
transmitting data over distances up to 4,000 m. 

Fiber-optic technology uses thin glass fibers (optical fibers) to transmit 
information from one location to another. Optical fibers are used in 
place of wire cabling, and light signals replace electrical charges sent 
over conventional wire cabling. 

The FOL operates by converting digital data into electrical pulses. The 
electrical signal is used to modulate light coming from a light-emitting 
diode (LED). The resulting light pulses, which are of the same duration 
as electrical pulses, are sent over the fiber-optic cable. At the receiving 
end, the light pulses are converted back into electrical pulses, which are 
then demodulated to recover the digital data. As with a standard FEI-1, 
these operations are invisible to both the front-end and Cray Research 
programmer. 

The fiber-optic FEI-1 cabinet is similar to the standard FEI-1 cabinet. It 
is modified with an attached compartment to hold the fiber-optic 
modules. In addition to this FEI-1 cabinet, another smaller cabinet 
containing more fiber-optic modules is located next to the Cray Research 
mainframe. These special fiber-optic modules modulate and demodulate 
the signals between the Cray Research mainframe and the front-end 
system. 



FEI-3 Front-end Interface 



The FEI-3 is a group of similar front-end interfaces that enables 
VME-based microcomputers and workstations to communicate with a 
CCA-1 channel adapter in the IOS over a standard 6-Mbyte/s I/O 
channel. Specific FEI-3 applications depend on the capabilities of the 
VME workstations or microcomputers. For example, Cray Research 
uses the FEI-3 to connect systems to an operator workstation. 
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The following list contains other possible FEI-3 applications: 

• To connect to a communications gateway for Control Subsystem 
Networks or other networks 

• To connect to a graphics output processor or device 

• To connect to a remote Cray Research station 

Each FEI-3 interface consists of two VME-compatible circuit boards that 
install into the target VME system, plus supporting cables and software 
drivers. The customer furnishes and provides support for the target VME 
system. 

The VMEbus is an industry standard that specifies the electrical and 
mechanical rules for a microcomputer backplane. Many popular 
microcomputer systems are based on the VMEbus. 

Direct Network Connections 

The CCA-1 channel adapter in the IOS supports direct connection to 
network adapters such as Network Systems Corporation (NSC) 
HYPERchannel adapters, Computer Network Technology LANlord 
adapters, and others. 

High Performance Parallel Interface (HIPPI) 

The Cray Research High Performance Parallel Interface (HIPPI) is an 
external channel that provides high-speed communications between 
HCA-3 and HCA-4 channel adapters in the IOS and peripheral 
equipment, such as network adapters, raster display devices, and mass 
storage systems. HIPPI conforms to industry standards and provides 
32-bit parallel data transfers at the rate of 100-Mbytes/s. 

HIPPI conforms to the preliminary draft proposed American National 
Standard (DPANS) HIPPI revision 7.0. The HIPPI proposal is based on 
an original design by engineers at Los Alamos National Laboratories. 

HIPPI is a simplex channel that transmits data in one direction; it is 
usually configured in pairs for full duplex operation. Driver software 
enables users to operate the HIPPI directly as a raw device or indirectly 
through Transmission Control Protocol/Internet Protocol (TCP/IP) or 
User Datagram Protocol (UDP) sockets, Remote Procedure Call (RPC) 
libraries, and Network File Systems (NFSs) between Cray Research 
computer systems. 
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Because HIPPI conforms to industry standards, it can be configured with 
many types of devices and applications that require high-speed transfer 
of large amounts of data. 

The following list contains other HIPPI applications: 

• Distributed applications. The speed of HIPPI makes more 
applications suitable for distributed processing. Users can link 
multiple Cray Research computer systems for maximum 
supercomputer performance. 

• Raster graphics. Real-time animated graphics are possible when 
HIPPI is combined with a compatible high-speed frame buffer. 
Existing devices have delivered up to 60 frames per second on a 
512-by-512 raster of 24-bit pixels. 



DEC VAX Supercomputer Gateway 

Digital Equipment Corporation (DEC) offers a VAX Supercomputer 
Gateway to enable direct connection between the DEC VAXcluster 
environment and a CCA-1 channel adapter in the IOS. 
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DD-60 Features 

Transfer rate: 

Sustained 16 to 20 Mbytes/s 

Burst rate 24 Mbytes/s 

Storage capacity 

One DD-60 1.96 Gbytes 

Total data sectors 119,968 

Logical sector size (64-bit words) . . . 2,048 

Total data words 245,694,464 

Typical position delays 

Single track 2.2 ms 

Average 12 ms 

Full stroke 26 ms 

Average latency 8.3 ms 

DE-60 Power and Cooling 
Specifications (eight DD60s) 

Required power 3 phase, 208 Vac, 

50 or 60 Hz, 20 A per phase 

or 3phase, 400 Vac, 

50 or 60 Hz, 8.5 A per phase 

Heat load 7,500 Btu/hr (2,200 W) 

Type of cooling air cooled 



DE-60 Physical Description 

Dimensions 

Height 61.7 in. (156.7 cm) 

Width 24 in. (61 cm) 

Depth 39.2 in. (99.5 cm) 

Floor space 6.5 ft 2 (0.6 m 2 ) 

Weight 

(all eight DD-60s) .... 1,015 lbs (460 kg) 

DE-60 Placement and Cabling 
Specifications 

Minimum clearance 

Sides 12 in. (25 cm) 

Front 36 in. (91 cm) 

Back 36 in. (91 cm) 

Length of power cable 6 ft (1.8 m) 

Maximum length of 

data cables 82 ft (25 m) 
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DD-61 Features 

Transfer rate: 

Sustained 2.3 to 2.6 Mbytes/s 

Burst rate 3.0 Mbytes/s 

Storage capacity 

One DD-61 2.23 Gbytes 

Total data sectors 545,072 

Logical sector size (64-bit words) 512 

Total data words 279,076,864 

Typical position delays: 

Single track 2.2 ms 

Average 12 ms 

Full stroke 26 ms 

Average latency 8.3 ms 

DE-60 Power and Cooling 
Specifications (eight DD60s) 

Required power 3 phase, 208 Vac, 

50 or 60 Hz, 20 A per phase 

or 3 phase, 400 Vac, 

50 or 60 Hz, 8.5 A per phase 

Heat load 3,900 Btu/hr (1,144 W) 

Type of cooling air cooled 



DE-60 Physical Description 

Dimensions 

Height 61.7 in. (156.7 cm) 

Width 24 in. (61 cm) 

Depth 39.2 in. (99.5 cm) 

Floor space 6.5 ft 2 (0.6 m 2 ) 

Weight 

(all eight DD-61s) 900 lbs (408 kg) 

DE-60 Placement and Cabling 
Specifications 

Minimum clearance 

Sides 12 in. (25 cm) 

Front 36 in. (91 cm) 

Back 36 in. (91 cm) 

Length of power cable 6 ft (1.8 m) 

Maximum length of 

data cables 82 ft (25 m) 
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DC-40 Features 

Transfer rate: 

Sustained 9.6 Mbytes/s 

Burst rate 20 Mbytes/s 

Storage capacity 5,200 Mbytes 

DC-40 Power and Cooling 

Required power 208 Vac, 3 phase, 

60 Hz, 60 A 

Type of cooling water cooled 

refrigeration/air cooling 

Water temperature (°F) 40 to 90 

Water temperature (°C) 4.4 to 32.2 

Heat load (to air) 1,330 Btu/hr, 390 W 

Heat rejection 

to water 24,000 Btu/hr, 7,643 W 

DCC-2/DC-40 
Physical Description 

The four DC-40s are housed in a disk control 
cabinet (DCC-2) that contains the power 
control and refrigeration components required 
for the DC-40. 

Floor space 8.7 ft 2 (0.81 m 2 ) 

Weight 1,240 lbs (562 kg) 

Cabinet dimensions: 

Height 60 in. (152 cm) 

Width 31 in. (79 cm) 

Depth 41 in. (104 cm) 

DCC-2/DC-40 
Placement and Cabling 

Minimum clearance: 

Sides 12 in. (30.5 cm) 

Front 36 in. (91.4 cm) 

Back 36 in. (91.4 cm) 

Length of power cable 8 ft (2.4 m) 

Maximum length of 

data cables 50 ft (14.4 m) 

DD-40 Features 

Transfer rate: 

Sustained 9.6 Mbytes/s 

Burst rate 20 Mbytes/s 



DD-40 Features (continued) 

Total data sectors 1,293,216 

Total data words 662,126,592 

Typical position delays: 

Single track 4 ms 

Average 16 ms 

Full stroke 30 ms 

DD-40 Power and Cooling 

Required power 208 Vac, 3 phase, 

60 Hz, 20 A 

Cooling air cooled 

Heat load 8,000 Btu/hr, 2,340 W 

DD-40 Physical Description 

Floor space 7.3 ft 2 (0.68 m 2 ) 

Weight 1,150 lbs (522 kg) 

Cabinet dimensions: 

Height 60 in. (152 cm) 

Width 26 in. (66 cm) 

Depth 41 in. (104 cm) 

DD-40 Placement and Cabling 

Minimum clearance: 

Sides 1 in. (2.5 cm) 

Front 36 in. (91.4 cm) 

Back 30 in. (76.2 cm) 

Length of power cable 6 ft (1.8 m) 

Maximum length of data cables ... 20 ft (6 m) 

The DCC-2 contains four DC-40 disk 
controllers. The DC-40 is a dual-ported 
interface with only one port active at a time. 

Four disk storage units (DSUs) are connected 
to the DCC-2 chassis for the DS-40 Disk 
Subsystem. 

Eight DSUs are connected to the DCC-2 
chassis for the DS-40D disk subsystem, but 
only four DSUs can be active at one time. 
This technique, known as daisy chaining, is 
used to double the capacity of a single 
subsystem from 21 Gbytes to 42 Gbytes; 
doubling the capacity does not double the 
performance because the data path is set at 
9.6 Mbytes/s. 
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DC-41 Features 

Sustained transfer rate 9.6 Mbytes/s 

Storage capacity 4,800 Mbytes 

DCC-2A Power and Cooling 

(four DC-41 s) 

Required power 208 Vac, 3 phase, 

60 Hz, 60 A 

Type of cooling water cooled 

refrigeration/air cooling 

Water temperature (°F) 40 to 90 

Water temperature (°C) 4.4 to 32.2 

Heat load (to air) 1,330 Btu/hr, 390 W 

Heat rejection 

to water 24,000 Btu/hr, 7,643 W 

DCC-2A Physical Description 

(four DC-41 s) 

The DC-41s are housed in a disk control 
cabinet (DCC-2A) that contains the power 
control and refrigeration components required 
for the DC-41. 

Floor space 8.7 ft 2 (0.81 m 2 ) 

Weight 1,240 lbs (562 kg) 

Cabinet dimensions: 

Height 67 in. (170 cm) 

Width 31 in. (79 cm) 

Depth 41 in. (104 cm) 

DCC-2A Placement and Cabling 

Minimum clearance: 

Sides 12 in. (30.5 cm) 

Front 36 in. (91.4 cm) 

Back 36 in. (91.4 cm) 

Length of power cable 8 ft (2.4 m) 

Maximum length of 

data cables 50 ft (14.4 m) 

The DCC-2A contains four DC-41 controllers. 
The DC-41 is a dual-ported interface with 
only one port active at a time. Two DCC-2A 
cabinets are used in a DS-41R disk subsystem 
to provide dual-channel access. 

Storage capacity and transfer rates are the 
same for both DS-41D and DS-41R disk 
subsystems. 



DD-41 Features 

Sustained transfer rate 9.6 Mbytes/s 

Total data sectors 1,175,760 

Total data words 601,989,120 

Typical position delays: 

Single track 5 ms 

Average 16 ms 

Full stroke 30 ms 

DE-41 Power and Cooling 

(four DD-41 s) 

Required power 208 Vac, 3 phase, 

60 Hz, 20 A 

Cooling air cooled 

Heat load 8,000 Btu/hr, 2,340 W 

DE-41 Physical Description 

(four DD-41 s) 

Floor space 7.3 ft 2 (0.68 m 2 ) 

Weight 1,150 lbs (522 kg) 

Cabinet dimensions: 

Height 67 in. (170 cm) 

Width 26 in. (66 cm) 

Depth 41 in. (104 cm) 

DD-41 Placement and Cabling 

Minimum clearance: 

Sides 1 in. (2.5 cm) 

Front 36 in. (91.4 cm) 

Back 30 in. (76.2 cm) 

Length of power cable 6 ft (1.8 m) 

Maximum length of data cables ... 20 ft (6 m) 

Four disk drives are connected to the DCC-2A 
chassis for the DS-41 disk subsystem. Eight 
disk drives are connected to the DCC-2A 
chassis for the DS-41D disk subsystem, but 
only four disk drives can be active at one time. 
This technique, known as daisy chaining, is 
used to double the capacity of a single 
subsystem from 19.2 Gbytes to 38.4 Gbytes; 
daisy chaining does not increase the 
9.6-Mbyte/s transfer rate. 
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DD-49 Features 

Storage capacity 1,200 Mbytes 

Transfer rate: 

Sustained 9.6 Mbytes/s 

Burst rate 12 Mbytes/s 

Total data sectors 297,696 

Total data words 150,420,352 

Typical position delays: 

Single track 2 ms 

Average 16 ms 

Full stroke 30 ms 

Power and Cooling 

Required power 3 phase, 208 Vac, 

60 Hz, 20 A per phase 

Heat load 9,000 Btu/hr (2,640 W) 

Type of cooling air cooled 

Physical Description 

Floor space 7.3 ft 2 (0.68 m 2 ) 

Weight 8441bs(383kg) 

Placement and Cabling 

Minimum clearance 

Sides 12 in. (25 cm) 

Front 36 in. (91 cm) 

Back 30 in. (76 cm) 

Length of power cable 6 ft (1.8 m) 

Maximum length of 

data cables 50 ft (15 m) 
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FEI Features 

Cray Research, Inc. offers hardware interfaces 

and station software to connect the 

CRAY Y-MP C90 system to a wide variety of 

popular computer systems, networks, and 

workstations. 

Mainframes: 

Amdahl 470 series 

CDC 70 

CDC 170 

CDC 180 

CDC 6000 

CDC 7600 

Honeywell 6000 

IBM 360 

IBM 370 

IBM 303x 

IBM 308x 

IBM 43xx 

Siemens 

Unisys 1100/80 series 

Minicomputers and microcomputers: 
Data General ECLIPSE series 
DEC PDP/11 
DEC VAX 11/750 
DEC VAX 11/780 
DEC VAX 11/782 
DEC VAX 11/785 
DEC VAX 8600 
DEC VAX cluster 
Motorola Delta Series microcomputer 

Networks: 

Ethernet (TCP/IP) networks 
Network Systems Corporation 
HYPERchannel 

Workstations: 

Sun-3 (through FEI-3's interface) 



Operating systems: 
Apollo AEGIS 

CDC NOS, NOS/BE, and NOS/VE 
Data General AOS 
Data General RDOS 
DEC VAX/VMS 
Bull HN Information Systems 
IBM MVS and VM 
Unisys 
UNIX 



Physical Description 

Floor space 4.38 ft 2 (3.42 m 2 ) 

Weight 200 lbs (91 kg) 

Height 23 in. (58.4 cm) 



5-28 



HR-04028 



FOL-3 Fiber-optic Link Specifications 



§iMi§!!lil 




RESEARCH, INC 



HR-04028 



5-29 



FOL-3 Specifications 



FOL-3 Description 

The FOL-3 is a fiber-optic connection 
between a Cray Research I/O subsystem (IOS) 
and a front-end interface (FEI). The FOL-3 is 
an alternative to the wire cabling between an 
IOS and an FEI. The FOL-3 is designed 
primarily to increase the maximum cabling 
distance between a Cray Research computer 
system and a front-end computer and to 
provide complete electrical isolation from 
electromagnetic fields. 



The FO cabinet is positioned on top of the 10 
cabinet. The FO cabinet contains an FO 
module that includes the receivers and 
transmitters for the fiber-optic cable and 
power connection for the module. 

The 10 cabinet contains a power supply and 
an 10 module. The 10 module provides an 
interface between the fiber-optic 
receiver/transmitter board and the Cray 
Research 6-Mbyte/s channel. 



FOL-3 Configurations 

The FOL-3 consists of the following 
equipment: 

Fiber-optic (FO) cabinet 

Interface (10) cabinet 

Electrical kit 

Below is an illustration of a general 
configuration of the FOL-3 used with an IOS. 
The dashed line encircles the components that 
comprise the FOL-3. 

At the Cray Research mainframe end (local 
end of the FOL) is a fiber-optic cabinet that 
consists of an 10 cabinet and an FO cabinet. 



At the FEI mainframe end (remote end of the 
FOL) of this link is an FEI cabinet. This 
cabinet consists of an FO cabinet positioned 
on top of an FEI cabinet. The FEI cabinet 
contains the modules necessary to 
communicate with the front-end computer 
system and a Cray Research 10 module. The 
FO cabinet is identical to the FO cabinet at the 
local end of the link. 

The electrical kit contains a Cray Research 10 
module, logic and power interconnections for 
the 10 module, and logic and power 
interconnections for the signal connection to 
the FO module in the FO cabinet. 
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General FOL-3 Configuration 
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The table to the right lists the required FOL-3 
equipment. The number of kits the Cray 
Research customer needs to purchase varies 
for each Cray Research computer system 
depending on the site and system 
configuration. 

The illustration below shows the general 
configuration of the FOL-3 when two Cray 
Research systems are configured together. 
The dotted line encircles the components that 
comprise the FOL-3. 



FOL-3 Equipment List 


Equipment 


Quantity Needed 


Electrical kit 


One kit per FOL-3 


FO cabinet 


Two kits for initial installation; 
one kit thereafter 


10 cabinet 


One kit 
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FOL-3 Connection Between Two Cray Research Computers Systems 



The illustration below shows the FOL-3 
connected to a CRAY Y-MP C90 computer 
system and four front-end computers. The 
6-Mbyte/s channel exiting the IOS connects to 
the 10 interface cabinet. The fiber-optic 
cables exit the 10 interface cabinet and are 
routed to the FEIs. 



The FEIs are connected to the front-end 
computer by the front-end channel. The 
dotted line encircles the components that 
comprise the FOL-3. 
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Mainframe 




A-10400 



FOL-3 Configured with Multiple Front-end Computer Systems 
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The following additional fiber-optic cable 
configurations are possible for the FOL-3: 

3-Mbyte/s, 4-km cable 

6-Mbyte/s, 2-km cable 

The equipment configurations for 2-km and 
4-km cable lengths are identical to those for 
the FOL-3. 

The customer is responsible for supplying and 
installing the fiber-optic cables. A variety of 
cable types and vendors exists. See your local 
Cray Research sales representative for cable 
specifications. 



FOL-3 Features 

The table shown below describes FOL-3 
features. 



Physical Description 

Floor space 4.1 ft 2 (0.38 m 2 ) 

Weight 240 lbs (109 kg) 

Height 27 in. (69 cm) 



FOL-3 Advantages 

The following items are advantages of using 
the FOL-3 as opposed to wire cabling: 

Decreased cost 

Increased security 

Increased cabling distances 

Decreased vulnerability to interference 

Ease of handling 



FOL-3 Features 


Feature 


Description 


Fiber-optic cable length 


3 ft. (0.91 m) to 3,280 ft. (1 ,000 m) 


Power Requirements 


-5.2 V, -2.0 V, 1 00-W total power 


Transfer Rate 


3 Mbytes/s 


Data Protection 


Cyclic redundancy check (CRC) on link data, 
parity generation, and channel data check 


Ground Isolation 


Complete ground isolation between a Cray 
Research computer system and a front-end 
computer. 
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6 SOFTWARE OVERVIEW 



CRAY Y-MP C90 computer systems come with a variety of software, 
including the Cray Research operating system UNICOS. The CF77 
compiling system provides automatic vectorizing, as do the Cray 
Research Standard C and Pascal compilers. Extensive library routines, 
program- and file-management utilities, debugging aids, a powerful Cray 
Research assembly language, and extensive support for industry 
standards are included in the system software. A large number of 
third-party and public-domain application programs also run on Cray 
Research systems. 

CRAY Y-MP C90 computer systems are supported by industry standard 
communications software such as the International Standards 
Organization Open Systems Interconnect (ISO/OSI) protocol and 
Transmission Control Protocol/Internet Protocol (TCP/IP). The 
CRAY Y-MP C90 systems are also supported by Cray Research 
proprietary station products for connecting to other vendors' systems and 
workstations. 



UNICOS Operating System 



CRAY Y-MP C90 computer systems come with the UNICOS operating 
system. The UNICOS operating system is derived from the UNIX 
Laboratories, Inc. UNIX System V operating system. It is also based in 
part on the Fourth Berkeley Software Distribution (BSD), under license 
from The Regents of the University of California. 

The UNICOS operating system provides exceptional problem-solving 
ease; it provides powerful interactive and batch capabilities and multiple 
methods to accomplish a task. It efficiently manages high-speed data 
transfers between the CRAY Y-MP C90 system and peripheral 
equipment. The UNICOS operating system is written in C, a high-level 
language, and is available on all Cray Research systems. 

The UNICOS operating system consists of a kernel plus a large set of 
utilities and library programs. The kernel is a simple structure with short 
and efficient software control paths. The kernel supports many system 
call primitives that library and application programs can use together to 
perform complex tasks. 
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Multiprocessing 



The UNICOS operating system offers a large set of utility programs that 
allows the user to interact with the operating system. In addition, it 
provides a number of products specifically designed for Cray Research 
computer systems. The UNICOS operating system supports the 
following compilers: Fortran, Pascal, C, Lisp, and Ada. 

The UNICOS operating system and UNIX are essentially the same in 
philosophy, structure, and function. However, Cray Research has 
enhanced UNIX to create the UNICOS operating system to use the 
power of the Cray Research computer system more efficiently. 
Enhancements include I/O capabilities to take advantage of 
supercomputer performance, added multiprocessor and multitasking 
support, additional networking software, accounting features, and others. 
The UNICOS operating system is designed for both interactive and batch 
environments. It supports the Network Queuing System (NQS) for batch 
processing. 



Multiprocessing divides an application program into independent tasks 
called partitions and runs them in parallel. Compared to serially 
executed programs, multiprocessing can substantially improve 
throughput. Three multiprocessing features have evolved, and they can 
all work together in a single program, but not in the same program unit. 
The following subsections describe the three multiprocessing features. 



Macrotasking Feature 



Macrotasking is the first phase in the evolution of Cray Research parallel 
processing software. It requires extensive data scoping and insertion of 
Cray Research-specific library calls that allow parallel execution of code 
at the subroutine level on multiple processors. Macrotasking is best 
suited to programs with large, long-running tasks. The user interface to 
the system's macrotasking capability is a set of Fortran-callable 
subroutines that explicitly define and synchronize tasks at the subroutine 
level. These subroutines are compatible with similar subroutines 
available on other Cray Research products. 



Microtasking Feature 



Microtasking is the second phase in the evolution of Cray Research 
parallel processing software. Microtasking expands on the strengths of 
macrotasking but requires less data scoping and uses compiler directories 
rather than Cray Research-specific library calls. 
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Microtasking is a multiprocessing technique that allows parallel 
execution of very small segments of code on multiple processors. An 
example of this is individual iterations of DO loops. With microtasking, 
the programmer can revise the code or issue compiler directives to 
further enhance performance beyond the automatic vectorization done by 
the compiler. 

In addition to working efficiently on parts of programs where the 
granularity is small, microtasking works well when the number of 
processors available for the job is unknown or may vary during the 
program's execution. Additionally, in a batch environment where 
processors may become available for short periods, the microtasked job 
can dynamically adjust to the number of available processors. 



Autotasking Feature 



The Autotasking feature (Autotasking) of the CF77 Fortran compiling 
system is the third phase in the evolution of Cray Research parallel 
processing software. Autotasking is based on the microtasking design 
and shares several advantages with microtasking: very low overhead 
synchronization cost, excellent dynamic performance independent of the 
number of central processing units (CPUs) available, both large and 
small granularity parallelism, and so on. 

Autotasking has two fundamental improvements over microtasking. 
First, it is automatic multiprocessing. Autotasking allows user programs 
to be automatically partitioned over multiple CPUs (without user 
intervention). Second, Autotasking can exploit parallelism at the 
DO-loop level without extending to subroutine boundaries. 



CF77 Compiling System 



CRAY Y-MP C90 computer systems use the Cray Research CF77 
compiling system. This compiling system is fully compliant with the 
ANSI 78 (Fortran 77) standards and offers a high degree of automatic 
scalar and vector optimization. The CF77 compiling system permits 
maximum portability of programs between different Cray Research 
systems and accepts many nonstandard programming constructs written 
for other vendors' compilers. Vectorized object code is produced from 
standard Fortran code; users can program in standard syntax to access the 
full power of the mainframe architecture. 

The CF77 compiling system consists of the following software: the 
Autotasking Fortran Dependence analyzer, the Fortran translator, and the 
Cray Research Fortran 77 compiler (CFT77). This system is a multipass, 
optimizing, transportable compiling system that processes existing 
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standard Fortran programs. It uses two basic techniques to improve the 
execution time of a Fortran program: vectorization and scalar 
optimization. 

The compiling system automatically generates code that uses the vector 
registers and functional units of the mainframe. The programmer does 
not need to know the details of vectorization because the compiling 
system automatically vectorizes Fortran programs. When the compiling 
system cannot vectorize code, it generates scalar code using a variety of 
optimization techniques to improve execution time. Scalar optimization 
transforms the internal representation of the Fortran program into a more 
efficient but functionally equivalent program. 

The CF77 compiling system is portable on several levels. Because it is 
in compliance with the ANSI 78 standard, programs written for other 
computer systems have maximal portability to a CRAY Y-MP C90 
system with minimal effort. Also, the compiling system is designed to 
run on all Cray Research systems enabling a Fortran program that 
compiles and runs on one Cray Research system to run on all Cray 
Research systems. In general, programs that compile and execute 
correctly with the CFT compiler also compile and execute correctly with 
the CF77 compiling system. 



C Compiler 



The C language is a high-level system programming language. Most of 
the UNICOS kernel code and utilities are written in C because C is a 
structured and highly efficient language. Many programming 
applications are also written in C. The C language offers a large standard 
library of functions and an ever-expanding base of software application 
programs. The availability of C complements the scientific orientation 
of Fortran. The Cray Research Standard C compiler performs scalar 
optimization and vectorizes code automatically. 

The Cray Research Standard C compiler is available on all Cray 
Research computer systems running the UNICOS operating system. The 
compiler translates C language statements into assembler instructions 
that make effective use of the Cray Research computer system. 

The C preprocessor (cpp) is included as a part of the Cray Research 
Standard C compiler. The cpp enables macro substitution, conditional 
compilation, and the inclusion of named files in the compilation process. 

Cray Research Standard C is portable on several levels. Because Cray 
Research Standard C is in compliance with the 1989 ANSI standard, 
programs written for other computer systems have maximal portability to 
a CRAY Y-MP C90 system with minimal effort. 
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Pascal 

Pascal is a high-level, general-purpose programming language used as 
the implementation language for the CF77 compiling system and other 
Cray Research products. Cray Research Pascal complies with the ISO 
Level 1 standard and offers such extensions to the standard as separate 
compilation of modules, imported and exported variables, and an array 
syntax. 

The Pascal compiler transforms Pascal code into machine language 
instructions that execute on Cray Research computer systems. Using 
Pascal, a programmer can implement algorithms and data structures in a 
high-level, machine-independent manner without sacrificing efficiency. 

The Cray Research Pascal compiler takes advantage of the mainframe 
hardware features through scalar optimization and automatic 
vectorization. The compiler provides access to Fortran common block 
variables and uses a common calling sequence that allows Pascal code to 
call Fortran and CAL routines. 



Cray Assembler 



The Cray Assembly Language (CAL) enables a user to closely tailor a 
program to the architecture of the mainframe. Through CAL, a 
programmer may symbolically express all hardware functions of the 
mainframe. CAL allows the production of highly efficient machine 
language programs. The user may designate program and data 
information to enable complete control of the mainframe CPUs. This 
facilitates full use of various features, such as the shared text feature, 
whereby a single set of instructions can service many users 
simultaneously. 



A set of versatile pseudo-operations for defining macro instructions and 
controlling the assembler enhances the basic instruction set. A macro 
library provides macro instructions for subroutine entry and exit, 
allowing for easy subroutine linkage. 



Cray Ada Environment 



The Cray Ada Environment includes a Cray Ada compiler and a set of 
related tools that link, debug, and maintain Ada application programs. 
The Cray Ada Environment also supports an implementation of Ada 
program libraries, providing for flexible, project-oriented software 
development. The Cray Ada Environment is validated under the current 
Ada Compiler Validation Capability (ACVC) test suite. 
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Cray Allegro CL 



Cray Allegro CL is a complete implementation of Common Lisp. The 
Cray Allegro CL system consists of an interpreter, an optimizing 
compiler, and a set of functions. There are a number of extensions to the 
specification in Cray Allegro CL. Included among the extensions are the 
top level, the debugger, a foreign function interface, and Flavors, an 
object-oriented programming system. Cray Allegro CL was designed to 
be compact, fast, and robust with respect to detecting user errors. The 
implementation itself is written mostly in Common Lisp, with some 
portions written in the C language. 



Subroutine Libraries 



Cray Research software includes subroutines that are callable from the 
CF77 compiling system, C, Pascal, and CAL. The subroutines are 
divided into libraries, generally on a functional basis. Libraries 
containing various utilities, high performance I/O subroutines, and 
numerous math and scientific routines are available, as are 
special-purpose libraries. 



Utilities 



A broad variety of software tools assists both interactive and batch users 
in the efficient use of a CRAY Y-MP C90 computer system. 



The SEGLDR segment loader is an automatic loader for code produced 
by the language processors CFT77, CFT, C, Pascal, and CAL that can 
also be explicitly controlled by the programmer. Program segments are 
loaded as required without explicit calls to an overlay manager. 

The Cray Symbolic Debugger (CDBX) allows users to interactively 
detect program errors by examining both running programs and program 
memory dumps. Other debugging tools are available for dump analysis 
and interpretation. 

A variety of performance aids assists in analyzing program performance 
and optimizing programs with minimal effort. These aids include both 
static and dynamic analyzers, as well as profilers for CPU and I/O usage. 
Many provide graphic user interfaces using the X Window System. 

The UNICOS Source Manager utility tracks modifications to files. This 
system is useful when programs and documentation undergo frequent 
changes because of development, maintenance, or enhancement. Line- 
and screen-oriented text editors, such as vi and Emac, offer versatility for 
users who wish to create and maintain text files. Other system utilities 
provide for proper management of the system resources. 
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Communications Software 

A CRAY Y-MP C90 computer system fits into environments consisting 
of single or multiple Cray Research systems, other vendors' mainframes, 
minicomputers, workstations, and devices capable of high-speed data 
transfer. Cray Research provides easy user access to Cray Research 
system capabilities, the ability to distribute applications between Cray 
Research computer systems and other vendor systems, and effective 
integration into existing customer networks. 

Communications and connectivity are supported by the Transmission 
Control Protocol/Internet Protocol (TCP/IP) suite, the International 
Standards Organization Open Systems Interconnect (ISO/OSI) protocol, 
and the UNICOS station call processor (USCP) protocol. 

The TCP/IP product allows the CRAY Y-MP C90 computer system to 
function as a peer in TCP/IP-supported, open networking environments. 
TCP/IP is a set of computer networking protocols that enables two or 
more hosts to communicate. Further, it is a set of procedures that allows 
communication among all hosts on a network whether the systems are 
similar or not. 

The TCP/IP networking protocols were defined by the U.S. Department 
of Defense and enhanced by the University of California at Berkeley 
with the UNIX system. TCP/IP is supported only under the UNICOS 
operating system. 

The ISO/OSI protocol logically connects Cray Research systems to other 
systems running ISO/OSI protocols. The UNICOS operating system 
supports the File Transfers, Access, and Management (FTAM) and 
Virtual Terminal (VT) applications of OSI. FTAM provides an 
interactive file transfer service for unstructured text files, unstructured 
binary files, and file directory files. VT allows users to connect to a 
remote system from a Cray Research system and to use the resources of 
the remote system. 

USCP provides, by way of station software products, support for 
communicating with various vendor systems through a vendor's 
proprietary networking capability, such as IBM's SNA or DEC's 
DECnet. 

Cray Research station software products provide system access to 
proprietary protocol implementations through network gateways. Cray 
Research supplies the station software packages for various front-end 
systems. These packages support batch job submission, job status, job 
control, file transfer, and interactive access to Cray Research systems. 
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The following stations are available: 

• Apollo station - provides the software connection between the Cray 
Research mainframe and the Domain workstation. 

• CYBER station - joins the Cray Research system to the Control 
Data Corporation CYBER 180 series, 70/170, or 700/800 systems 
to form a powerful computing combination. 

• VAX or VMS station - controls the hardware and software link 
between a DEC VAX computer system and a Cray Research 
computer system. 

• MVS station - provides the software connection between an IBM 
System/370, Extended Architecture, or compatible computer 
system and a Cray Research computer system. 

• VM station - enables IBM compatible systems running under 
control of the Virtual Machine/System Product (VM/SP) and 
Conversational Monitor System (CMS) to be linked with a Cray 
Research computer system. 

• UNIX station - provides Cray Research operating system access to 
installations whose front ends run UNIX. 

• SUPERLINK/MVS product - provides data access, 
application-to-application communication, and job processing 
between the UNICOS operating system and MVS systems. 

• CLS-UX product - provides Cray Research operating system 
access to UNIX users through a VAX or VMS system. 



Applications 



Cray Research supports application software vendors in converting and 
optimizing software for CRAY Y-MP C90 computer systems. Many of 
the most widely used application programs are currently available and 
supported to run in the Cray Research UNICOS environment. These 
codes are in fields such as computational fluid dynamics, structural 
analysis, mechanical engineering, nuclear safety, circuit design, seismic 
processing, image processing, molecular modeling, and artificial 
intelligence. 



Cray Research has also developed the UniChem and MPGS applications. 
UniChem is an integrated software environment for chemists. MPGS is 
an interactive postprocessing visualization tool. 
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The availability of applications for Cray Research systems is driven 
largely by customer requirements that are communicated to the software 
vendors. Cray Research supports the on-going process of converting and 
maintaining application software. 



Software Publications 



The following subsections provide a partial list of Cray Research 
software publications. The manuals provide additional information 
about the software described in this section. These manuals and other 
user publications can be ordered through Cray Research local or regional 
sales offices. Refer to the User Publication Catalog (publication number 
CP-0099) for a complete list of software publications. 



UNICOS Operating System 

Publication 



Fortran 



Number Title 

SG-2005 I/O Subsystem (IOS) Operator's Guide for UNICOS 

SG-2017 UNICOS Source Code Control System (SCCS) User's 

Guide 

SG-2050 UNICOS Text Editors Primer 

SG-2052 UNICOS Overview for Users 

SG-2112 UNICOS Installation Guide 

SG-2113 UNICOS System Administration for Source Releases 

SR-2011 UNICOS User Commands Reference Manual 

SR-2012 Volume 4: UNICOS System Calls Reference Manual 

SR-2014 UNICOS File Formats and Special Files Reference Manual 

SR-2022 UNICOS Administrator Commands Reference Manual 



Publication 
Number Title 

SR-3071 CF77 Compiling System, Volume 1: Fortran Reference 
Manual 



Publication 
Number Title 

SR-2074 Cray Standard C Programmer's Reference Manual 
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Pascal 



Libraries 



Utilities 



Publication 
Number Title 

SR-0060 Pascal Reference Manual 



Publication 
Number Title 

SR-2057 Volume 5: UNICOS Network Library Reference Manual 
SR-2079 Volume 1: UNICOS Fortran Library Reference Manual 
SR-2080 Volume 2: UNICOS Standard C Library Reference Manual 
SR-2081 Volume 3: UNICOS Math and Scientific Library Reference 
Manual 



Publication 

Number Title 

SD-2107 I/O Subsystem Model E (IOS-E) Guide 

SG-2051 UNICOS Tape Subsystem User's Guide 

SG-2094 UNICOS CDBX Debugger User's Guide 

SG-3074 CF77 Compiling System, Volume 4: Parallel Processing 

Guide 

SG-3078 OWS-E Operator Workstation Operator's Guide 

SG-3079 OWS-E Operator Workstation Administrator's Guide 

SR-0010 Software Tools Reference Manual 

SR-0066 Segment Loader (SEGLDR) and Id Reference Manual 

SR-2091 UNICOS CDBX Symbolic Debugger Reference Manual 

SR-3077 OWS-E Operator Workstation Reference Manual 



Communications Software 



Publication 

Number Title 

SA-0250 Apollo DOMAIN Station Reference Manual 

SC-0270 CDC NOS/VE Link Software Command Reference Manual 

SG-2009 TCP/IP and OSI Network User's Guide 

SI-0038 IBM MVS Station Reference Manual 

SI-0160 IBM VM Station Command and Reference for COS 

SI-0191 SUPERLINK Guides 
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Publication 

Number Title 

SR-0034 CDC NOS/BE Station Reference Manual 

SR-0035 CDC NOS Station Reference Manual 

SU-0107 UNIX Station User's Guide 

SU-3121 CLS-UX Station User Guide 

SV-0020 DEC VAX/VMS Station Reference Manual 



Publication 
Number 



Title 



MCDR-1000M 



Directory of Applications Software for Cray 
Research Supercomputers for 1992 



Software Training 



Cray Research offers complete training on the software available for 
CRAY Y-MP C90 computer systems. Extensive user-support analyst and 
system analyst training is available at Cray Research's training facility. 
End-user and operator training are available at customer sites after 
installation of a Cray Research computer system. More information 
regarding courses and schedules can be obtained through your local or 
regional Cray Research sales office. 
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GLOSSARY 



Application 
A register 

Autotasking 



Auxiliary input/output 
processor (EIOP) 



Software designed to perform a particular job or set of related jobs. 

Address register. A registers are primarily used as address registers for 
memory references and as index registers. 

The process of automatically dividing up a program into individual tasks 
and organizing them to make the most efficient use of the Cray Research, 
Inc. computer hardware; a trademark of Cray Research, Inc. 

A quarter board in the IOS-E that controls the transfer of data between 
the channel adapters (CAs) and the IOS-E buffer board. 



B 



Bank The smallest addressable division of central memory. 

B register Intermediate address register. B registers are used as intermediate 
storage for the A registers. 

BDM Bidirectional memory mode (bit). The modes field in the exchange 

package contains the BDM mode bit. When the BDM mode bit is set, 
block read and write operations can operate concurrently. 

BiCMOS Bipolar complementary metal oxide semiconductor. 

BPI Breakpoint interrupt (flag). The breakpoint interrupt flag sets if the 

interrupt-on-breakpoint (IBP) interrupt mode bit is set and enabled and a 
write reference is made to an address within the breakpoint range. 

Buffer board A module in the IOS-E that temporarily stores system data from the 
high-speed (HISP) channels or the channel adapters (CAs). 
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CAL 

CCA-1 

Central memory 

Central processing unit 
(CPU) 

Chaining 

Channel adapter (CA) 
Check bits 



CLN 



Clusters 



Cluster interface (CIN) 

Compiler 

CP 

CRI 
C90 Mode 



Cray Assembly Language. A symbolic language that generates machine 
instructions on a one-for-one basis and allows programs to call 
subroutines from the library through the use of pseudoinstructions. 

A channel adapter that connects the IOS-E to a 6-Mbyte/s channel pair. 

Memory residing in the mainframe. 

A module used in the mainframe that controls the flow of system data, 
performs mathematical and logical functions on system data, and 
executes program instructions. 

The process of sequencing logical operations so the results of one 
operation may be used by another operation without needing a memory 
reference in between. 

A component in the IOS-E that transfers control and data between the 
buffer board and the peripherals. 

Bits used to determine if one or more bits in a word have an incorrect 
value. 

Cluster number (register). The CLN register in the exchange package 
determines which set of the 17io available clusters of SB, ST, and SM 
registers the CPU can access. 

A set of shared registers accessible by all CPUs. There are 17io valid 
clusters of shared registers in a CPU. 

A quarter board in the IOS-E that transfers control and data between the 
workstation interface (WIN) and the input/output processor multiplexer 
(IOP MUX) and auxiliary input/output processors (EIOPs). 

A software program used to convert high-level programming language 
into binary machine code. 

Clock period. The CP is the interval in which the system clock 
completes one oscillation. 

Cray Research, Inc. 

The modes field in the exchange package contains the C90 mode bit. 
When the C90 mode bit is set, the mainframe operates in C90 mode. 
The V registers are 64 bits x 128 elements, the VL register is 8 bits 
wide, and the P register is 32 bits wide, which enables a program range 
of 1 Gword. The entire CRAY Y-MP C90 instruction set is also 
available. 
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DCA-1 A channel adapter that connects the IOS-E to DS-40, DS-41, and DS-49 
disk storage units. 

DCA-2 A channel adapter that connects the IOS-E to DD-60 and DD-61 disk 
drives. The DCA-2 disk channel adapter in the IOP manages control 
signals and protocol for the individual disk drives in a DE-60. 

DBA Data base address (register). The DBA register, part of the exchange 
package, holds the base address of the user's data range. 

DCC-2 The Cray Research DCC-2 houses the DC-40, which is separate from the 
DD-40 disk drives. 

DCU Disk controller unit. An interface between the disk storage units (DSUs) 
and the auxiliary I/O processor (EIOP). 

DC-40 The Cray Research DC-40 disk controller provides interface logic to 
adapt DCA-1 signals and protocol for individual DS-40s, to handle 
routing among the drives, and to buffer data from the four spindles in a 
full-track buffer. 

DC-41 The Cray Research DC-41 disk controller provides interface logic to 
adapt DCA-1 signals and protocol for individual disk drive units, to 
handle routing among the drives, and to buffer data from the four 
spindles in a full-track buffer. 

DD-40 The Cray Research DD-40 disk drive. 

DD-41 The Cray Research DD-41 disk drive. 

DD-49 The Cray Research DD-49 disk drive. 

DD-60 The Cray Research DD-60 high-performance disk drive. 

DD-61 The Cray Research DD-61 high-performance disk drive. 

DE-60 One Cray Research DE-60 disk enclosure cabinet contains a maximum 
of nine DD-60 disk drives. Eight of the disk drives store system data, 
and the ninth disk drive is a spare. The DCA-2 disk channel adapter in 
the IOP manages control signals and protocol for the individual disk 
drives in a DE-60. 

Deadlock A condition resulting in the inability to continue processing due to an 
unresolvable conflict. Deadlock occurs when all CPUs in a cluster are 
holding issue on a test and set instruction. 



HR-04028 Glo-3 



Glossary 



CRAY Y-MP C90 Functional Description Manual 



D (continued) 



Deadstart 



Dielectric coolant 



DL 



The sequence of operations required to start an operating system running 
in a Cray Research computer system. 

A fluid that travels through the module cold plates, removes heat from 
the modules, and transfers the heat to refrigerant in the heat exchange 
subassembly. 

Deadlock interrupt (flag). The deadlock interrupt flag sets if the IDL 
interrupt mode bit is set, the program is not in monitor mode, and a 
deadlock condition occurs because all CPUs in a cluster are holding issue 
on a test and set instruction. 



DLA Data limit address (register). The DLA register holds the upper limit 
address of the user's data range. 

DRAM Dynamic random access memory. A memory device that must be 
refreshed periodically in order to store data. 

DSU Disk storage units. A computer disk drive. 

DS-40 The DS-40 disk subsystem consists of the DD-40 disk drive, the DC-40 
disk control unit (DCU), and the disk controller cabinet (DCC-2). 

DS-41 The DS-41 disk subsystem consists of the the DC-41 disk controller and 
the DD-41 disk drive. 



EIM Enable interrupt modes (flag). The interrupt modes field in the exchange 
package contains the EIM flag. An exchange to monitor mode or 
non-monitor mode sets the EIM flag. 

EIOP Auxiliary I/O processor. The EIOP controls the channel adapters that 
connect the IOS-E to peripheral devices such as disk drives, tape drives, 
and communications channels. 

EMI Electromagnetic interference. EMI is radiated energy that interferes with 
and distorts digital signals. 

ESL Enable second vector logical mode (bit). The modes field in the 

exchange package contains the ESL mode bit. When the ESL mode bit 
is set, the second vector logical functional unit is enabled, and if it is not 
busy, it has first priority to execute instructions 140ijk through 145ijk. 

Ethernet A particular type of network hardware that forms a physical link between 
computers; a trademark of Xerox Corporation. 
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E (continued) 



Exchange mechanism 



Exchange package 



EXX 



The technique used in the CRAY Y-MP C90 computer system for 
switching instruction execution from program to program. Refer to 
exchange package. 

A 16-word block of data in memory reserved for exchange packages. 
The exchange package contains the necessary registers and flags 
associated with a particular program. Each program has its own 
exchange package. 

Error exit interrupt (flag). The EXX interrupt flag sets if the 
enable-interrupt-on-error-exit (FEX) interrupt mode bit is set and enabled 
and an error exit instruction (000000) issues. Issuing an error exit 
instruction always causes an exchange, regardless of the state of FEX. 



F 



FEI 



Fetch sequence 



Floating-point (operation). When an F appears in front of a register 
designator in a symbolic machine instruction, the calculation is a 
floating-point operation. 

Front-end interface. An interface that connects the CRAY Y-MP C90 
computer I/O channels to channels of front-end computers. An FEI 
compensates for differences in channel widths, machine word size, 
electrical logic levels, and control signals. 

A fetch sequence transfers a block of instructions from memory to an 
instruction buffer. 



FEX 



Floating-point 
operation 

Fluorinert Liquid 



Enable-interrupt-on error exit Mode (bit). The interrupt modes field in 
the exchange package contains the FEX bit. When the FEX bit is set, it 
enables an interrupt when an error exit occurs. 

A mathematical or logical operation on two or more real numbers. 



The dielectric coolant circulated through the module cold plates; a 
trademark of 3M. 



FNX Interrupt-on-normal exit mode (bit). The interrupt modes field in the 
exchange package contains the FNX bit. When the FNX bit is set, it 
enables the NEX interrupt flag to set if a normal exit instruction 
(004000) issues. Issuing a normal exit instruction always causes an 
exchange, regardless of the state of FNX. This mode is not affected by 
the EIM flag. 
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F (continued) 



FOL-3 



FPE 



FPS 



Functional unit 



Fiber-optic link. The Cray Research 3-Mbyte/s fiber-optic link allows an 
FEI to be separated from a Cray Research computer system by distances 
of up to 3,281 ft (1,000 m). The FOL-3 provides complete electrical 
separation of the connected devices. 

Floating-point error (flag). The interrupt flags field in the exchange 
package contains the FPE flag. The FPE flag sets when a floating-point 
range error occurs in any of the floating-point functional units and the 
Interrupt-on-floating-point error (IFP) flag is set. 

Floating-point error status (bit). The status field in the exchange package 
contains the FPS status bit. The floating-point status bit sets if a 
floating-point error occurred during the execution interval. 

Circuitry designed to perform a particular mathematical or logical 
operation. 



Gate array 



Gather/scatter 



An array of circuits contained in a single integrated circuit package; 
these circuits may be customized in their operation as a group. 

An operation that places data at various intervals in the available 
memory storage and then gathers the data back into its original 
organization. 



H 



H 

HCA-3 

HCA-4 

HDA 

HEU 



High Performance 

Parallel Interface 

(HIPPI) 



Half-precision floating-point (operation). When an H appears in front of 
a register designator in a symbolic machine instruction, the calculation is 
a half -precision floating-point operation. 

A channel adapter that connect the IOS-E to a HIPPI input channel. 

A channel adapter that connects the IOS-E to a HIPPI output channel. 

Head disk assembly. A sealed assembly that contains the magnetic 
storage media (disk drive), read and write heads, and servo mechanism. 

Heat exchange unit. Part of the cooling equipment for the 

CRAY Y-MP C90 mainframe. The HEU uses a refrigerant to cool the 

dielectric coolant that circulates through the mainframe. 

A type of interface used to transfer control and data between Cray 
Research, Inc. channel adapters and peripherals. 
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H (continued) 



High-speed (fflSP) 
channel 



A channel that transfers system data between the IOS-E and the 
mainframe or between the IOS-E and the SSD-E. 



High-speed control 
multiplexer (HCM) 



A quarter board in the IOS-E that controls the transfer of high-speed 
(HISP) channel information between the IOS-E and the SSD-E or 
mainframe. 



HYPERchannel 



A trademark and product of Network Systems Corporation (NSC) that 
interfaces a LOSP channel to other brands of computers. 



I Reciprocal iteration (operation). When an I appears in front of a register 
designator in a symbolic machine instruction, the calculation is a 
reciprocal iteration operation. 

IBA Instruction base address (register). The IBA register is in the exchange 
package. The IBA register holds the base address of the user's 
instruction range. 

IBP Interrupt-on-breakpoint mode (bit). The interrupt modes field in the 
exchange package contains the IBP bit. When the IBP bit is set, it 
enables an interrupt if a breakpoint occurs. 

ICM Interrupt-on-correctable memory error mode (bit). The interrupt modes 
field in the exchange package contains the ICM bit. When the ICM bit is 
set, it enables interrupts on correctable memory data errors while data is 
being read from memory. 

ICP Interprocessor Interrupt (flag). The interprocessor interrupt flag sets if 

the IIP interrupt mode bit is set and enabled and another CPU requests an 
interrupt of this CPU by issuing instruction 0014/1. 

IDL Interrupt-on-deadlock mode (bit). The interrupt modes field in the 
exchange package contains the IDL bit. When the IDL bit is set, it 
enables an interrupt if a deadlock occurs while the program is not in 
monitor mode. IDL has no effect in monitor mode. 

IFP Interrapt-on-floating-point error mode (bit). The interrupt modes field in 
the exchange package contains the IFP bit. When the IFP bit is set, it 
enables interrupts on floating-point errors. 

IIO Interrupt-on-I/O mode (bit). The interrupt modes field in the exchange 
package contains the IIO bit. When the IIO bit is set, it enables an 
interrupt if SIE is set and this CPU is the lowest-numbered CPU with 
110=1 and EIM=1. 
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I (continued) 



HP Interrupt-on-interprocessor interrupt mode (bit). The interrupt modes 
field in the exchange package contains the IIP bit. When the IIP bit is 
set, it enables an interprocessor interrupt if requested by another CPU. 

ILA Instruction limit address (register). The ILA register is in the exchange 
package. The ILA register holds the limit address of the user's 
instruction field. 

IMC Interrupt-on-request from MCU mode (bit). The interrupt modes field in 
the exchange package contains the IMC bit. When the IMC bit is set, it 
enables interrupts from the MCU. 

IMI Interrupt-on-monitor mode instruction mode (bit). The interrupt modes 
field in the exchange package contains the IMI bit. When the IMI bit is 
set, it enables an interrupt if a monitor mode instruction (QOlijk; jf^O) 
issues while the program is not in monitor mode. IMI has no effect in 
monitor mode. 



Input/output cluster 
(IOC) 



Input/output processor 
multiplexer (IOP MUX) 



Input/output subsystem 
model E (IOS-E) 



Intelligent peripheral 
interface - 2 (IPI-2) 

Interrupt modes field 



Instruction buffer 



Instruction fetch 



A component of the IOS-E that contains one cluster interface (CIN), one 
input/output processor multiplexer (IOP MUX), four auxiliary 
input/output processors (EIOPs), two high-speed control multiplexers 
(HCMs), one buffer board, sixteen channel adapters (CAs), and one 
programmable interrupt (PINT). 

A quarter board in the IOS-E that transfers information between the 
high-speed control multiplexers (HCMs), the cluster interface (CIN), and 
the auxiliary input/output processors (EIOPs). 

A component of a CRAY Y-MP C90 computer system that transfers 
system data between the peripherals, SSD solid-state storage device 
model E (SSD-E), and the mainframe. 

An interface used to transfer control and system data between Cray 
Research, Inc. channel adapters and peripherals. 

The interrupt modes field in the exchange package contains 
user-selectable bits that dictate the execution of the program. 

A set of registers in a CRAY Y-MP C90 mainframe used for temporary 
storage of instructions before issue. Each instruction buffer can hold 128 
consecutive instruction parcels. 

The process of loading program code from central memory to an 
instruction buffer. 
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I (continued) 



Instruction set 
I/O buffer 

IOC 
IOI 

IOP 

IOR 



IPC 



IPI-2 



IPR 



IRP 



IRT 



Issue sequence 



A set of instructions that a particular computer can perform. 

A buffer used to provide temporary storage for data transferred between 
the mainframe and peripheral devices. 

Refer to input/output cluster. 

I/O interrupt (flag). The I/O interrupt flag sets if the SIE bit is set and 
this CPU is the lowest-numbered CPU with IIO interrupt mode set and 
enabled when a LOSP or VHISP channel completes a transfer. 

I/O processor. An IOP is a fast, multipurpose computer capable of 
transferring data at extremely high rates. The IOS-E contains multiple 
IOPs. 

Operand range error mode (bit). The interrupt modes field in the 
exchange package contains the IOR bit. When the IOR bit is set, it 
enables interrupts on operand address range errors. 

Interrupt-on-request from programmable clock mode (bit). The interrupt 
modes field in the exchange package contains the IPC bit. When the IPC 
bit is set, it enables an interrupt on a request from the programmable 
clock. 

An interface used to transfer control and system data between Cray 
Research, Inc. channel adapters and peripherals. 

Interrupt-on-program range error mode (bit). The interrupt modes field 
in the exchange package contains the IPR bit. When the IPR bit is set, it 
enables the PRE interrupt flag to set if a program range error occurs. 

Interrupt-on-register parity error mode (bit). The interrupt modes field in 
the exchange package contains the IRP bit. When the IRP bit is set, it 
enables an interrupt if a register parity error is detected while data is 
being read from a register. 

Interrupt-on-request from real-time clock mode (bit). The interrupt 
modes field in the exchange package contains the IRT bit. When the IRT 
bit is set, it enables an interrupt on a request from the real-time clock. 

The issue sequence selects the instruction indicated by the program 
address (P) register, decodes it, determines whether the required registers 
or functional units are available, and if so, enables the CPU to execute 
the instruction. 
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IUM Interrupt-on-uncorrectable memory error mode (bit). The interrupt 

modes field in the exchange package contains the IUM bit. When the 
IUM bit is set, it enables interrupts on uncorrectable memory data errors. 



Library 



LLRC 



Low-speed (LOSP) 
channel 



A set of commonly used software routines that are available to 
programmers and to programs being compiled. 

Length/longitudinal redundancy check. An error control system based 
on the arrangement of data in blocks according to some preset rule, the 
correctness of each character within the block being determined on the 
basis of the specific rule or set. 

Low-speed channel. The LOSP channel has a transfer rate of 6 or 
20 Mbytes/s and enables an I/O cluster to communicate with a 
mainframe, front-end interface, or SSD-E. 



M 



Mainframe A component of a CRAY Y-MP C90 computer system that contains 
central memory and central processing units (CPUs). 

Maintenance channel A channel that connects the MWS-E to the SSD-E. 



Maintenance control 
unit interface (MCUI) 



A quarter board in the mainframe that receives an interrupt signal from 
the programmable interrupt (PINT) and interrupts the specified CPU in 
the mainframe. 



Maintenance 
workstation model E 

(MWS-E) 

MCU 



A component of a CRAY Y-MP C90 computer system that provides an 
intelligent and dedicated platform for performing offline and online tests, 
monitoring environmental conditions, and recording hardware errors. 

Maintenance control unit. The maintenance control unit for the 
CRAY Y-MP C90 in the MWS-E. 



MCU Maintenance control unit interrupt (flag). The MCU interrupt flag sets if 
the IMC interrupt mode bit is set and enabled and the MCU interrupt 
signal becomes active on I/O channel 40. 

MGS Motor-generator set. An MGS converts primary power from commercial 
power mains to the voltage and frequency used by the mainframe power 
supplies. 
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M (continued) 



MEC Memory error correctable interrupt (flag). The memory error 
(correctable) flag sets if the ICM interrupt mode bit is set and a 
correctable memory error occurs while data is being read from memory. 

MEU Memory error uncorrectable (flag). The memory error (uncorrectable) 
flag sets if IUM interrupt mode is set and enabled and an uncorrectable 
memory error occurs while data is being read from memory. 

MFC Mainframe chassis. 

Mil Monitor instruction interrupt (flag). The monitor instruction interrupt 
flag sets if the IMI interrupt mode bit is set and a monitor mode 
instruction (001ijk;j * 0) issues while the program is not in monitor 
mode. 



MM 



Monitor mode 

Multitasking 

Multiprocessing 
Multiprogramming 



Monitor mode (bit). The modes field in the exchange package contains 
the MM bit. When the MM mode bit is set, it inhibits all interrupts 
except memory errors, normal exit, and error exit. The program can 
execute those instructions that are privileged to monitor mode. 

A condition in which a CPU inhibits all interrupts except those caused by 
memory errors, normal exit, or error exit instructions. 

The capability to run two or more parts, or tasks, of a single program in 
parallel on different CPUs within a mainframe. 

Several computer processes or jobs being computed at the same time. 

The process of writing software to utilize the capabilities of a computer 
to process multiple jobs simultaneously. 



N 



NEX Normal exit interrupt (flag). The interrupt flags field in the exchange 
package contains the NEX flag. The normal exit flag sets if the 
interrupt-on-normal-exit (FNX) interrupt mode bit is set and enabled and 
a normal exit instruction (00400) issues. Issuing a normal exit 
instruction always causes an exchange, regardless of the state of FNX. 
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ORE 



Operating system 



Operator workstation 
model E (OWS-E) 



Operand range error (flag). The interrupt flags field in the exchange 
package contains the ORE flag. The ORE flag sets when a data 
reference is made outside the boundaries of the DBA and DLA registers 
and the interrupt-on-operand range error bit is set. 

The major controlling software running in a computer that controls its 
overall operation. 

A component of a CRAY Y-MP C90 computer system that Cray 
Research, Inc. analysts and customers use to monitor the computer 
system. 



P Population count (operation). When a P appears in front of a register 
designator in a symbolic machine instruction, the calculation is a 
population count operation. 

Pascal A high-level programming language. 

Parcel A 16-bit portion of a word that is addressable for instruction execution 
but not for operand references. 

Parity Equivalence in the check bit of transmitted and received data. 

P register Program address register. The P register selects an instruction parcel 
from one of the instruction buffers. The contents of the P register are 
stored in the program address register field in the exchange package. 
The P register is 24 bits wide in Y-MP mode and 32 bits wide in C90 
mode. 

PCI Programmable clock interrupt (flag). The programmable clock interrupt 
flag sets if the IPC interrupt mode bit is set and enabled and the counter 
in the programmable clock equals 0. 

Pipelining An operation or instruction that begins before a previous operation or 
instruction finishes. Pipelining is accomplished using fully segmented 
hardware. 

PN Processor number. The PN field in an exchange package indicates which 
CPU executed the exchange sequence. 

Port A hardware or software access path to memory. 
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P (continued) 



PRE 



Programmable clock 



Program range error (flag). The interrupt flags field in the exchange 
package contains the PRE flag. The PRE flag sets when an instruction 
fetch is made outside the boundaries of the IBA and ILA registers. 

A 32-bit counter in each CPU that is used to generate interrupts at 
selectable intervals. 



Programmable interrupt 
(PINT) 

Protocol 



PS 



A quarter board in the IOS-E that enables any input/output cluster (IOC) 
in the IOS-E to interrupt any CPU in the mainframe. 

Software that defines the precise way in which data is transferred from 
one place to another. 

Program status (bit). The interrupt modes field in the exchange package 
contains the PS bit. The PS bit is set by the operating system to denote 
whether a CPU concurrently processing a program with another CPU is 
the master or slave in a multitasking situation. 



Q 



Parity count (operation). When a Q appears in front of a register 
designator in a symbolic machine instruction, the calculation is a parity 
count operation. 



R Rounded floating-point (operation). When an R appears in front of a 

register designator in a symbolic machine instruction, the calculation is a 
rounded floating-point operation. 

RAM Random access memory. A memory device that retains the stored data 
as long as power is applied. When power is removed from the device, 
the stored information is lost. 



RCU 



Reciprocal 
approximation 

Refrigerant 



Refrigeration and condensing unit. The RCUs dissipate the heat 
transferred from the heat exchanger units (HEUs). 

The mathematical process of approximating the value of a real number 
when divided into one (1/n). 

A fluid that removes heat from dielectric coolant and transfers the heat to 
water or air. 



Register A hardware storage location for one word, byte, or element of data. 
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R (continued) 



RISC Reduced instruction set computer. 

RPE Register parity error (flag). When a word is written into a register, a set 
of parity bits is generated and stored with the data bits. This set of parity 
bits is compared to another set that is generated when the word is read 
out of the register. An error is indicated when the two sets do not match. 
Parity errors set the register parity error (RPE) flag in the exchange 
package. 

RT Load real-time clock (instruction). The RT instruction loads the 
real-time clock register with the contents of a scalar (S) register. 

RTC Real-time clock. The RTC is a 64-bit counter that advances one count 
each clock period. 

RTI Real-time interrupt (flag). The real-time interrupt flag sets if the IRT 
interrupt mode bit is set and enabled and a real-time interrupt request is 
received. 



Scalar A single numerical value that represents a single aspect of a physical 
quantity. 

Section A major addressable division of central memory that may be further 
divided into subsections and banks. 

Semaphore A 1-bit value stored in a register and used by programs to communicate 
the occurrence of an event. 



Shared registers 



Registers that are available for more than one CPU to write to and read 
from. 



Single-byte correction/ 

double-byte detection 

(SBCDBD) 

Single-error 

correction/double-error 

detection (SECDED) 

Small computer system 
interface (SCSI) devices 

S register 



A method of detecting whether one or more 4-bit bytes in a word has an 
incorrect value. If only one byte has an incorrect value, that byte can be 
changed back to the correct value. 

A method of detecting whether one or more bits in a word has an 
incorrect value. If only one bit has an incorrect value, that bit can be 
changed back to the correct value. 

Components of the MWS-E and OWS-E that store and retrieve data used 
in the workstations. 

Scalar register. The S registers are the source and destination registers 
for operands executing scalar arithmetic and logical instructions. 
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S (continued) 



SB 



Segmentation 



SEI 



Shared registers 



Shared address (register). The SB register is a shared register used for 
transferring address information from one CPU to another. 

An operation that is divided into a discrete number of sequential steps, or 
segments. Fully segmented hardware is designed to perform one 
segment of an operation during a single clock period (CP). 

Selected for external interrupts (flag). The interrupt modes field in the 
exchange package contains the SEI flag. When the SEI flag is set, this 
CPU is preferred for I/O interrupts. 

Registers that are available for more than one CPU to write to and read 
from. 



SM Semaphore (register). The SM register is a shared register used for 
control between CPUs. 



Source code 
Spindle 
SPARC 



SSD solid-state storage 
device model E (SSD-E) 

ST 



Status field 



Subsection 



System Maintenance and 

Remote Testing 

Environment (SMARTE) 



A software program written in a high-level programming language. 

A component of a DD-40 or DD-41 disk drive. 

Scaleable Processor ARChitecture. A trademark of SPARC 
International. 

A component in a CRAY Y-MP C90 computer system that provides 
secondary data storage for the IOS-E and the mainframe. 

Shared T (register). The ST register is a shared register used for 
transferring data from one CPU to another. 

The exchange package contains a status field that is used to determine 
the operating modes of a CPU. 

A major addressable division of memory that may be further divided into 
banks. 

A Cray Research trademark; an online program that performs hardware 
verification, error detection, error isolation, and automated degradation 
of faulty hardware components. 
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T register Intermediate scalar register. The T registers are used as intermediate 
storage for the S registers. 



TCA-1 Tape subsystem channel adapter. 



u 



UNICOS An operating system for Cray Research computer systems based 

primarily on the UNIX System Laboratories, Inc. UNIX System V and 
partially on the Fourth Berkeley Software Distribution. UNICOS is 
essentially the same in philosophy, structure, and function as UNIX, but 
has been enhanced to exploit the power of Cray Research computer 
systems. 



Vector A single numerical value that contains information on more than one 
aspect of a physical quantity. 



Very high-speed 
(VfflSP) channel 



A channel that transfers system data between the mainframe and the 
SSD-E. 



V register Vector register. Each V register contains 64 bits x 64 elements in Y-MP 
mode and 64 bits x 128 elements in C90 mode. 

VL Vector length (register). The program-selectable VL register controls the 
effective length of a vector register for any operation. The VL register is 
7 bits wide in Y-MP mode and 8 bits wide in C90 mode. 

VM Vector mask (register). The VM field allows for the logical selection of 
particular elements of a vector. 

VNU Vector not used status (bit). The state of the VNU bit in the exchange 
package status field indicates whether vector instructions (077xxx or 
140xxx through 177xxx) were issued during the execution interval. 
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w 



Warning and control 
system (WACS) 



Word 



A system that monitors the refrigeration and power distribution systems 
to ensure that the computer system is operating within recommended 
temperature and voltage limits. 

A set amount of data that contains 64 bits of system data and 8 check 
bits. 



Workstation interface 

(WIN) 

ws 



A quarter board in the IOS-E that transfers control and data between the 
MWS-E or OWS-E and the cluster interface (CIN). 

Waiting for semaphore status (bit). The interrupt modes field in the 
exchange package contains the WS status bit. The waiting on semaphore 
bit sets if a test and set instruction (0034/&) is holding issue. 



XA Exchange address (register). The XA register in the exchange package 
specifies the address of the first word of a 16- word exchange package 
loaded by an exchange sequence. 



Y-MP mode Y-MP mode is selected by setting the C90 mode bit in the exchange 
package to 0. When the mainframe is operating in Y-MP mode, the V 
registers are 64 bits x 64 elements, the VL register is 7 bits wide, and the 
P register is 24 bits wide, which enables a program range of 4 Mwords. 
In Y-MP mode, only instructions defined in previous CRAY Y-MP 
systems are available. 



Leading-zero count (operation). When a Z appears in front of a register 
designator in a symbolic machine instruction, the calculation is a 
leading-zero count operation. 
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Related Cray Research, Inc. hardware manuals are listed below. Refer to 
Section 6 for a list of related software manuals. 



Cray Research Peripheral Equipment Site Planning Reference Manual, 
CRI publication number HR-00080. 

This manual provides site planning information about operator and 
maintenance workstation equipment, disk storage units (DSUs), and 
front-end interface (FEI) cabinets. 

Cray Support Equipment Site Planning Reference Manual, CRI 
publication number HR-00082. 

This manual provides site planning information about refrigeration 
condensing units (RCUs) and motor-generator sets (MGSs). 

CRAYY-MP C90 Site Planning Reference Manual, CRI publication 
number HR-04025. 

This manual describes the physical requirements for the 
CRAY Y-MP C90 computer system. It defines customer and Cray 
Research, Inc. site planning and preparation responsibilities. This 
manual also describes the operational requirements, system 
configurations, mainframe and cooling unit specifications and 
requirements, and computer room floor specifications. 



Principles of Computer Room Design, CRI publication number 
HR-04013. 

This manual describes computer room design principles to help 
computer room facility managers prepare, inspect, and maintain a 
stable, problem-free environment. Information on computer room 
and raised-floor construction, system cooling, environmental control, 
fire and lightning protection, power, and grounding is also discussed. 
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Safe Use and Handling of Fluorinert Liquids, CRI publication number 
HR-0306. 

This manual is written for Cray Research, Inc. customers and field 
engineers whose Cray Research computer system uses Fluorinert 
Liquid. The manual warns and informs about using Fluorinert 
Liquid and describes its uses at Cray Research, Inc. The manual 
describes the Material Safety Data Sheet and explains its significance 
in using Fluorinert Liquid or any other chemical. 
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Boldface numbers refer to illustrations and 
charts. 



Access conflicts, 2-38 

ACVC, 6-5 

AC voltage monitoring, 1-11 

Adapters, network, 5-14. See also channel 

adapters 
Adding coefficients. See Algorithms 
Address 

(A) registers, 2-4 

data, 2-4 
Address add functional unit, 2-8. See also 

Integer arithmetic 
Address multiply functional unit, 2-8. See also 

Integer arithmetic 
Address registers. See A register 
Algorithms 

floating-point addition, 2-18--2-19 

floating-point division, 2-21-2-24 

floating-point multiplication, 2-19-2-20 

functional units, 2-7 
Alternate-path configuration 

DD-60, 5-5 

DS-41, 5-9 
AND function, 2-11 
Apollo station, 6-8 
Applications, software, 6-8-8-9 
Approximating roots, 2-22 
A register 

applications, 2-6 

computation section, 2-6 

fields, 2-32 

transfer instructions, 2-56, 2-59 
Autotasking, 2-34, 2-42, 6-4 



Bases, number, vi 

Biased and unbiased exponent ranges, 

2-15-2-16 
BiCMOS chips, 2-1 



Bidirectional memory transfer instructions, 

2-62-2-63 
Bit count instructions, 2-73 
24-bit integer multiply (performed in a 

floating-point multiply functional unit), 

2-13 
32-bit integer multiply (performed in a 

floating-point multiply functional unit), 

2-14 
Block diagrams 

CRAY Y-MP C90 cooling system, 1-9 
mainframe, 2-2 
Block transfers, 2-38 
Branch instructions, 2-74-2-77 
Breakpoint interrupt instructions, 2-79 
B registers, 2-6. See also Intermediate 



Cabinet, fiber-optic, 5-13 

Cabling. See Fiber-optic link 

CAL, 6-5 

CAL syntax, special forms, 2-53. See also 

specific instructions 
CCA-1 channel adapter, 3-4 
C compiler, 6-4 
CDBX, 6-6 
Central memory 

access conflicts, 2-38 

as a functional unit, 2-8 

general, 2-1-2-3 

overview, 1-3 

RAM, 2-1 
CF77 compiling system, 6-3-6-4 
Chaining 

instruction sequence, 2-40 

vector, 1-3 

vector example, 2-39, 2-40 
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Channel 

adapters, 1-5, 3-1, 3-4-3-6, 5-14 

control instructions, 2-77 

types, 1-3, 2-3 
Channels 

fflSP, 1-4, 1-5, 3-1, 3-3-3-4, 4-3 

LOSP, 1-4, 3-1, 3-3-3-4 

vrasp, 1-5, 4-2 
Chilled water system, 1-9-1-10 
CLN. See Cluster number 
Closed-loop 

dielectric-coolant system, 1-9-1-10 

refrigerant system, 1-9-1-10 
CLS-UX product, 6-8 
Cluster channel connections, 3-2 
Cluster number 

field, 2-32 

set instruction, 2-78 
Clusters 

components of, 3-1 

interprocessor communication section, 2-3 

IOS-E, 1-5, 3-1-3-6 
C90 mode and Y-MP mode differences, 

2-49-2-51 
Coefficients, adding. See Algorithms 
Common lisp, 6-6 
Communications software, 6-7-6-9 
Computation section, CPU, 2-4-2-25 
Concurrent operations, chaining, 2-40. See 
Multiprocessing and Functional unit 
independence 
Conditional branch instructions, 2-75-2-76 
Configurations. See also Alternate-path, 
Daisy-chain, and Single-port 

DS-41A field-upgradable, 5-9 

minimum, 1-2 
Conflicts, central memory access, 2-38 
Control section, CPU, 2-25-2-33 
Conventions 

instruction, 2-45-2-46 

manual, vi 
Conversion, floating-point to decimal, 2-16. See 
also Normalized floating-point numbers 
Coolant flow monitoring, 1-11 
Cooling 

and power equipment, 1-8-1-9 

systems, 1-9 
Cpp, 6-4 



CPU 

computation section, 2-4-2-25 
control section, 2-25-2-33 
instruction summary, 2-54-2-79 
overview, 1-3 

programmable clock, 2-33, 2-78 
shared resources, 1-3, 2-1-2-4 

Cray Ada Environment, 6-5 

Cray Allegro CL, 6-6 

Cray assembler, 6-5 

Cray Assembly Language, 6-5 

CRAY Y-MP C90 
components, 1-1 

cooling system block diagram, 1-9 
DSUs, 1-5 

functional unit operations, 2-11-2-25 
interrupt flags, 2-30 
interrupt modes, 2-28-2-29 
mainframe block diagram, 2-2 
mainframe specifications, 2-81—2-82 
maintenance and monitoring, 1-6 
minimum configuration, 1-2 
network interfaces, 5-12-5-15 
operating modes, 2-31 
software, 6-1 
status field bit assignments, 2-31 

CRC errors, 3-5 

CYBER station, 6-8 



Daisy-chain configuration 

DD-60, 5-4 

DS-40, 5-11 

DS-41, 5-9 
Data 

errors, 1-4 

flow (computation section), 2-4-2-5 

formats, 1-4 

storage in central memory, 2-1 

transfer, 1-4, 2-3, 2-6 

transfer registers, 2-6, 2-7 
Data base address register field, 2-27 
Data limit address register field, 2-27 
Data transfers 

mainframe to SSD-E, 4-2-4-3 

SSD-E to IOS-E, 4-3 

from S to V registers, 2-39 
DBA register, 2-27 
DC-40 functions, 5-10 
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DC-41 disk controller, 5-7, 5-8 
DCA-1 

channel adapters, 3-5 

general, 1-5, 5-10 
DCA-2 

channel adapters, 3-5 

functions, 5-2, 5-6 

general, 1-5, 5-10 
DCC-2, 5-9 
DCU, 5-9 

DC voltage monitoring, 1-11 
DD-49. See Disk drives 
DD-60. See Disk drives 
DE-60 cabinet, 5-2 

DEC VAX supercomputer gateway, 5-15 
Diagnostics, 1-6 
Dielectric coolant, 1-10 
Disk drives 

channel adapters, 3-5 

DD-49, 5-11-5-12 

DD-60, 5-1, 5-4, 5-5 

DD-61, 5-6-5-7 
Disk drive specifications 

DD-49, 5-25-5-26 

DD-60, 5-17-5-18 

DD-61, 5-19-5-20 
Disk storage units, 1-5 
Disk subsystems 

DS-40, 5-9-5-11 

DS-41, 5-7-5-9 
Disk subsystem specifications 

DS-40 and DS-40D, 5-21-5-22 

DS-41, DS-41D, and DS-41R, 5-23-5-24 
Division. See also Reciprocal approximation 

algorithm, floating point, 2-21-2-24 

general, 2-5 

integer, 2-14 
DLA register, 2-27 
Documentation. See Publications 
Double-precision numbers, 2-25 
DRAM chips, 4-1 
DS-40. See Disk subsystems 
DS-41. See Disk subsystems 
DS-41A field-upgradable configurations, 5-9 
DS-41B package, 5-9 
DSUs, 1-5 



EIM flag, 2-28, 2-29 



EIOPs 

general, 1-5 

I/O buffer, 3-3 

I/O cluster, 3-1, 3-2 
Electrical separation, 1-7 
Elements, equalizing. See Algorithms 
Equipment separation, 1-7 
Equivalence, logical, instructions, 2-70 
Error exit instructions, 2-77 
Error messages. See Special register values 
Exchange 

address set instruction, 2-77 

address (XA) register field, 2-32 

mechanism, 2-25 

package fields, 2-26-2-32 

sequence, 2-26 
Exclusive NOR function, 2-12 
Exclusive OR 

function, 2-12 

logical, instructions, 2-70 
Exponent ranges, biased and unbiased, 2-15, 

2-16 
Expression (exp), 2-55 



FEI-1 front-end interface, 5-12-5-13. See 

also Front-end interface specifications 
FEI-3 front-end interface, 5-13-5-14. See 

also Front-end interface specifications 
FEIs, 1-7 

FEX mode, 2-28, 2-29 
Fiber-optic link, 1-7, 5-13. See also FOL-3 

specifications 
Fields, exchange package, 2-26—2-32 
Field-upgradable configurations, DS-41A, 5-9 
Fixed-point operations. See Integer arithmetic 
Floating-point 

add and multiply range errors, 2-17 

arithmetic, 2-5, 2-14-2-25 

arithmetic instructions, 2-66-2-68 

computations, 1-3 

conversion to decimal, 2-16 

data format, 2-15-2-16 

functional units, 2-11-2-12 

functional unit underflow condition, 2-18 

numbers, normalized, 2-16--2-17 

range errors, 2-17--2-18 
Floating-point add functional unit, normalized 
numbers, 2-10, 2-17 
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Floating-point algorithms 

addition, 2-18-2-19 

division, 2-21--2-24 

multiplication, 2-19--2-20 
Floating-point multiply functional unit 

division algorithm, 2-21-2-24 

integer arithmetic, 2-12-2-14 

normalized numbers, 2-11, 2-17 
Floating-point reciprocal approximation. See 
also Reciprocal approximation 
functional unit 

functional unit, 2-11 

range errors, 2-18 
Fluctuations, power, 1-8 
Fluorinert Liquid, warning, 1-10 
FNX mode, 2-28, 2-29 
FOL-3 specifications, 5-29-5-32. See also 

Fiber-optic link 
Formats, instruction, 2-46-2-49 
FPE, range errors, 2-17, 2-18 
Front-end interface specifications, 5-27-5-28. 

See also FEI-1 and FEI-3 
FTAM applications, 6-7 
Functional instruction summary, 2-56 
Functional unit 

independence, 2-34, 2-37 

instruction summary, 2-55 
Functional unit operations 

approximating roots, 2-22 

biased and unbiased exponent ranges, 2-15 

24-bit integer multiply, 2-13 

32-bit integer multiply, 2-14 

floating-point add and multiply range errors, 
2-17 

floating-point arithmetic, 2-14-2-25 

floating-point data format, 2-15 

floating-point reciprocal approximation 
range errors, 2-18 

integer arithmetic, 2-12-2-14 

integer data formats, 2-13 

internal representation of a floating-point 
number, 2-16 

logical operations, 2-11-2-12 
Functional units 

address, 2-8 

general, 1-3, 2-7-2-8 

floating-point, 2-10-2-11 

population/parity/leading count, 2-8, 2-9 

scalar, 2-8-2-9 

vector, 2-9-2-11 



Graphics, raster, 5-15 



Hardware. See Pipelining and segmentation 
HCA-3 and HCA-4 channel adapters, 3-5-3-6 
HEUs, 1-8, 1-9 
fflPPI 

channel adapters, 3-5 

external channel, 5-14-5-15 
fflSP channels 

general, 1-4, 1-5 

I/O cluster, 3-1, 3-3-3-4 

transfers, 4-3 
HYPERchannel adapter, 5-14 



IBA register, 2-27 

IBM compatible tape drives and controllers, 1-5 
IFP mode, range errors, 2-17, 2-18 
ILA register, 2-27 
Inclusive OR function, 2-12 
Index calculation, 2-4, 2-6 
Instruction. See also Functional units and 
Instructions 

chaining sequence, 2-40 

differences between Y-MP mode and C90 
mode, 2-49-2-51 

execution, switching from program to 
program, 2-25 

fetch sequence, 2-33 

formats, 2-46-2-49 

issue, 1-3, 2-33 

pipelining and segmentation, 2-34 

set, 2-5 

summary, CPU, 2-54-2-79 
Instruction base address register field, 2-27 
Instruction limit address register field, 2-27 
Instructions 

bit count, 2-73-2-74 

branch, 2-74-2-77 

channel control, 2-77 

error exit, 2-77 

floating-point arithmetic, 2-66-2-68 

functional, summary, 2-56 

functional unit, summary, 2-55 

integer arithmetic, 2-64-2-65 

interprocessor interrupt, 2-61 

interregister transfer, 2-58-2-62 

logical operation, 2-68-2-72 

memory transfer, 2-62-2-64 
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monitor mode, 2-54, 2-77-2-79 

normal exit, 2-76 

operand range error interrupt, 2-78 

register entry, 2-56-2-58 

shift, 2-72-2-73 

vector, 2-39 

vector population count, 2-74 

V register transfer, 2-58, 2-60 

Write, 2-63 
Integer 

computations, 1-3 

data formats, 2-13 

product, 32-bit, 2-4 
Integer arithmetic 

address functional units, 2-8 

general, 2-5, 2-12-2-14 

instructions, 2-64-2-65 

scalar functional units, 2-8-2-9 
Interfaces, network, 5-12—5-15. See also FEI-1 

and FEI-3 
Intermediate 

address (B) registers, 2-4 

registers, 2-5 

register transfer instructions, 2-61 

scalar (T) registers, 2-4 
Internal representation of a floating-point 

number, 2-16 
Interprocessor 

communication section, 1-3, 2-3 

interrupt instructions, 2-79 
Interregister transfer instructions, 2-58-2-62 
Interrupt 

flags field, 2-29-2-30 

modes field, 2-28-2-29 
Interrupt-on-register-parity error mode. See IRP 
I/O buffers, 3-3 
I/O cluster. See Clusters 
IOP 

instructions, 3-3 

I/O cluster, 3-2-3-3 

MUX, 1-5, 3-1, 3-2 
IOR mode, 2-27 
IOS-E 

clusters, 1-5, 3-1-3-6 

data transfers, 4-3 

functions, 1-4—1-5 

specifications, 3-9-3-11 
I/O section, 1-3, 2-3 
IPR mode, 2-27, 2-28, 2-29 
IRP, 2-6, 2-7 



ISO/OSI protocol, 6-7 
Italic, use of, vi 

Iterations, based on Newton's method, 2-22, 
2-23 



Kernel, UNICOS, 6-1 



Libraries, subroutine, 6-6 
LLRC errors, 3-6 
Logical 

operations, 2-11-2-12 

operation instructions, 2-68-2-72 
LOSP channel, I/O cluster, 1-4, 3-1, 3-3-3-4 



Macrotasking, 6-2 
Mainframe 

block diagram, 2-2 

data transfers, 4-2—4-3 

specifications, 2-81-2-82 

overview, 1-3—1-4 
Maintenance, with MWS-E, 1-6 
Matrix operations, indexing for, 2-4 
Memory 

SSD-E, 4-1-4-2 

transfer instructions, 2-62-2-64 
Merge 

instructions, 2-71-2-72 

operation, 2-12 
MGSs, 1-8 
Microtasking 

features, 6-2—6-3 

overview, 2-41-2-42 
Modes field, 2-31 
Monitoring. See also WACS 

AC voltage, 1-11 

DC voltage, 1-11 

with MWS-E, 1-6 
Monitor mode instructions, 2-54, 2-77-2-79 
MPGS interactive postprocessing visualization 

tool, 6-8 
MPU card, 5-11 
Multiplication. See Algorithms and Integer 

arithmetic 
Multiprocessing 

defined, 1-1, 1-4, 2-34, 

features, 6-2-6-3 

overview, 2-41-2-42 
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Multitasking, 1-4, 2-34, 2-41--2-42 

MVS station, 6-8 

MWS-E 

functions, 1-6-1-7 

WINs, 3-6 

workstation chassis, 1-6 



Network. See also FEI-1 and FEI-3 

connections, direct, 5-14 

gateways, 6-7 

interfaces, 5-12-5-15 
Networking protocols, 6-7 
Newton's method for approximating roots, 2-21, 

2-22 
No-operation instruction. See Monitor mode 

instructions 
NOR function, exclusive, 2-12 
Normal exit instructions, 2-76 
Normalized floating-point numbers, 2-16—2-17 
Normalizing results. See Algorithms 
Notational convention for instructions, 

2-45-2-46 
Number bases, vi 



Operand range error interrupt instructions, 2-78 
Operating 

modes, 2-31 

registers, 2-5-2-7 
ORE interrupt flag, 2-27 
OR function 

exclusive, 2-12, 2-70 

inclusive, 2-12 
Overflow condition. See Range errors 
OWS-E 

functions, 1-6, 1-7 

WINs, 3-6 

workstation chassis, 1-6 



Parallel processing features, 2-34-2-42 

Parity bits, 2-6, 2-7 

Pascal, 6-5 

Performance counter instructions, 2-79 

Performance monitor, 2-33 

Peripheral devices. See Channel adapters 

PINT, 3-7 



Pipe 

functional units, 2-9 

1 functional units, 2-9 

Pipelining and segmentation, 2-34-2-36. See 

also Segmentation 
Pipes, vector operations, 2-7 
Population/parity/leading zero count functional 

unit, 2-8, 2-9 
Population parity count instructions, 2-74 
Ports, CPU, 2-1 
Power 

and cooling equipment, 1-7-1-9 

sources, 1-8 

supplies, mainframe, 1-8, 1-9 

WACS source, 1-11 
P register, CPU control section, 2-25 
PRE interrupt flag, 2-27 
Pressure monitoring, 1-10, 1-11. See also 

WACS 
Primary registers, 2-5, 2-8 
Processor number field, 2-32 
Program address register. See P register 
Program address register field, 2-26 
Programmable clock 

general, 2-33 

instructions, 2-78 
Program range, 2-54 
Publications 

software, 6-9-6-11 

training, 6-11 



RAM, central memory, 2-1 

Range errors, floating-point, 2-17—2-18 

Raster graphics, 5-15 

RCUs, 1-8, 1-9 

Read instructions, 2-64 

Real-time clock 

general, 2-4 

set instruction, 2-78 
Reciprocal approximation. See Floating-point 

algorithms 
Reciprocal approximation, floating-point 

instructions, 2-68 
Reciprocal approximation functional unit 

division algorithm, 2-21-2-24 

normalized numbers, 2-17 
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Register 

access, 2-5 

entry instructions, 2-56-2-58 

values, special, 2-53 
Register parity error flag. See RPE 
Registers 

address, 2-6 

bit designator convention, vi 

intermediate and primary, 2-5 

operating, 1-3, 2-5-2-7 

scalar, 2-6 

vector, 2-7 
Remote support, 1-7 
Reservations on V registers, 2-39 
Results, normalizing. See Algorithms 
Return jump instructions, 2-76 
RPE, 2-6, 2-7 



SBCDBD, 2-1 
SB registers, 2-3 
Scalar 

data, 2-4 

floating-point data, 2-4 

leading zero count instruction, 2-74 

memory references, 2-6 

population count instructions, 2-73 

register uses, 2-6 

segmentation and pipelining example, 2-35 

(S) registers, 2-4 

processing overview, 1-1, 1-3—1-4 
Scalar add functional unit, 2-8. See also Integer 

arithmetic 
Scalar instructions. See also Floating-point 

arithmetic 
Scalar logical functional unit, 2-8 
Scalar operations. See Floating-point functional 

units 
Scalar shift functional unit, 2-9 
SECDED, 2-1, 3-3, 4-2 
Segmentation 

functional unit, 2-7 

general, 2-34-2-36 

scalar example, 2-35 

vector example, 2-36 
Semaphore 

registers, 2-3 

register transfer instructions, 2-58 



Separation 

electrical, 1-7 

equipment, 1-7 
Shared registers, 2-3 

Shared register transfer instructions, 2-61 
Shared resources, CPU, 1-3, 2-1-2-4. See 

also Specifications 
Shift instructions, 2-72-2-73 
Single-port configuration 

DD-60, 5-3 

DS-40, 5-11 

DS-41, 5-9 
SM registers. See Semaphore 
Software 

CRAY Y-MP C90, 6-1 

publications, 6-9, 6-11 
Special 

CAL syntax forms, 2-53 

register values, 2-53 

syntax with integer arithmetic 
instructions, 2-64 
Specifications 

DD-49, 5-25-5-26 

DD-60, 5-17-5-18 

DD-61, 5-19-5-20 

DS-40 and DS-40D, 5-21-5-22 

DS-41, DS-41D, and DS-41R, 5-23-5-24 

FOL-3, 5-29-5-32 

front-end interface, 5-27—5-28 

IOS-E, 3-9-3-11 

mainframe, 2-81-2-82 

SSD-E, 4-5-4-6 
S register. See also Scalar functional units 

field, 2-32 

transfer instructions, 2-56, 
2-59-2-60 
SSD-E 

data transfers, 4-2—4-3 

memory, 4-1-4-2 

memory sizes, 4-1 

overview, 1-5 

specifications, 4-5-4-6 
Stations, 6-8 
Status 

field bit assignments, 2-31 

registers, 2-33 

registers transfer instructions, 2-61 
Storage, I/O buffer, 3-3. See also Disk drives, 

Disk subsystems, and SSD-E 
ST registers, 2-3 
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Subroutine libraries, 6-6 
SUPERLINK/MVS product, 6-8 
Swapping. See Exchange Sequence 
Syntax 

CAL, 2-53 

with integer arithmetic instructions, 2-64 
System components, 1-1 



Tape controller channel adapter, 3-6 

Tape drives and controllers, 1-5 

TCA-1, 1-5, 3-6 

TCP/IP, 6-7 

Temperature 

monitoring, 1-10, 1-11 
recommended water-supply, 1-10 

Test and set instruction, 2-3 

Training publications, 6-11 

Transients, 1-8 

T registers. See Intermediate 

Truncation. See Floating-point multiplication 
algorithm 



Unconditional branch instructions, 2-75 
Underflow condition. See Range errors 
UniChem environment, 6-8 
UNICOS, 2-42, 6-1, 6-2 
UNIX, 6-1, 6-2 
UNIX station, 6-8 
Upgrades, with OWS-E, 1-7 
USCP protocol, 6-7 
Utilities 

general, 6-6 

UNICOS, 6-2 



memory references, 2-6 
population count instruction, 2-74 
processing, 1-1, 1-3-1-4, 2-34, 2-37-2-41 
segmentation and pipelining example, 2-36 

Vector functional units, 2-9-2-10, 2-35. See 
also Floating-point functional units 

Vector length (VL) register field, 2-32 

Vector length register transfer instructions, 2-62 

Vector mask (VM) register transfer instructions, 
2-62 

Vector operations, automatic, 2-38. See also 
Floating-point functional units 

VfflSP channels, 1-5, 4-2 

VMEbus, 5-14 

VM station, 6-8 

Voltage monitoring, 1-11 

V register 

functions, 2-38-2-39 

general, 2-4, 2-7 

transfer instructions, 2-58, 2-60 

VT applications, 6-7 



WACS, 1-7, 1-10-1-11 
WINs, 1-5, 3-6-3-7 
Workstation interfaces. See WINs 
Write instructions, 2-63 



XA register field, 2-32 



Y-MP mode. See C90 mode 



Values, special register, 2-53 
Vector 

chaining example, 2-40 

data, 2-4 

defined, 2-37 

examples, 2-38 

floating-point data, 2-4 

instructions, 2-39 

leading zero count instruction, 2-74 

length (VL) register, 2-7 

mask bits format, 2-45 

mask instructions, 2-10, 2-71 

mask (VM) register, 2-7 
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